! THE EFFECTS OF TYPED VERSUS HANDWRITTEN ESSAYS ON STUDENTS’ SCORES ON PROFICIENCY TESTS By Erika Lessien A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Teaching English to Speakers of Other Languages - Master of Arts 2013 ABSTRACT ! ! THE EFFECTS OF TYPED VERSUS HANDWRITTEN ESSAYS ON STUDENTS’ SCORES ON PROFICIENCY TESTS By Erika Lessien Previous researchers (Lam & Pennington, 1995; Lee, 2004) have investigated the difference between language learners’ L2-writing-test scores when the learners are required to type essays compared to when they must handwrite them. The outcomes have been mixed, and this may be because the researchers did not investigate whether L2 proficiency impacts score differences. Therefore, in this study I will investigate the score differences in advanced versus intermediate-level English-language learners on handwritten versus typed essay tests. Sixty-one students, from three different proficiencies, were asked to handwrite one essay and type another from prompts retired from the university’s English-language placement test. Two trained raters rated the essays using the university’s placement test rubric. Using a multiple linear regression, I compared score differences across the conditions (handwriting versus typing) and between the groups (intermediate versus advanced English-language learners). I found that there is a significant difference for the advanced students, and their scores were much greater for the typed condition than for the handwritten condition. This study sheds light on the effects L2 essay test conditions have on L2 test program outcomes; programs that even today use handwritten essays to assess language learners’ academic-writing ability ! ! ACKNOWLEDEMENTS I would like to take this opportunity to say thank you to the people who helped and encouraged me in the writing of this thesis. First and foremost, I would like to thank my thesis advisor, Dr. Paula Winke, for all of the time she devoted to this project and for teaching me how to use SPSS and encouraging me throughout the course of this project. Without her help and support it would not have been possible. I would also like to thank my second reader, Dr. Charlene Polio, for all of valuable comments and suggestions she gave. I would like to thank the MA TESOL program for their financial support throughout this project. I would like to thank Mike Kramizeh, the head of the language laboratory at Michigan State University, for the use of the language laboratory during my data collection. I have received much support from my classmates and colleagues and I am grateful to them for their help. I would like to give a special thank you to Jack Drolet and Peter Sakura for giving up so much of their time in helping me complete several important steps during this project. I would also like to thank Dr. Dan Reed and Andy McCullough for helping me during my very beginning planning stages. A special thank you to Justin Cubilo for helping me work through parts of this project. Also, thank you to Laura Ballard and Lorena Valmori who helped me with some important last minute changes. Finally, I would like to thank my mom for always being there to encourage me and to help me when I needed it, and my grandparents for always supporting and encouraging me in all of my endeavors throughout my life. iii! ! TABLE OF CONTENTS LIST OF TABLES ............................................................................................................. iii LIST OF FIGURES ........................................................................................................... iv CHAPTER 1: INTRODUCTION ........................................................................................................... 1 CHAPTER 2: LITERATURE REVIEW ................................................................................................ 3 Theories and models on assessing writing .................................................................. 3 Studies on the use of paper-based versus computer-based assessments .................. 6 CHAPTER 3: METHODS ...................................................................................................................... 15 Participants .................................................................................................................. 15 Materials ...................................................................................................................... 16 Procedure ..................................................................................................................... 17 Study Approval ........................................................................................................ 17 Setting ....................................................................................................................... 17 Computer Set-up...................................................................................................... 18 Data Collection......................................................................................................... 18 Essay Rating ............................................................................................................. 20 Analysis ........................................................................................................................ 21 CHAPTER 4: RESULTS ........................................................................................................................ 23 Research Question 1 .................................................................................................... 23 Research Question 2 .................................................................................................... 26 Research Question 3 .................................................................................................... 29 CHAPTER 5: DISCUSSION AND CONCLUSION ............................................................................ 31 Categorical Changes ................................................................................................... 31 The Effects of Proficiency, L1, and Preference ........................................................ 35 Student Perceptions..................................................................................................... 38 Implications ................................................................................................................. 39 Limitations ................................................................................................................... 41 Directions for Future Research .................................................................................. 42 iv! ! APPENDICES ................................................................................................................. 45 Appendix A: ................................................................................................................. 46 Appendix B: ................................................................................................................. 47 Appendix C .................................................................................................................. 48 Appendix D: ................................................................................................................. 50 REFERENCES................................................................................................................ 53 v! ! LIST OF TABLES Table 1: Participants' Backgrounds.................................................................................. 16 Table 2: Rater Reliability .................................................................................................. 23 Table 3: Descriptive Statistics for Writing Medium ......................................................... 24 Table 4: Paired Samples t-test .......................................................................................... 24 Table 5: Word Counts by Level ........................................................................................ 26 Table 6: Regression Factors and Their Effect .................................................................. 27 Table 7: Correlation of Scores.......................................................................................... 28 Table 8: Student Perceptions:Typed versus Handwritten ................................................. 30 Table 9: Student Perceptions:Which Essay Will Be Scored Higher ................................. 30 Table 10: Rubric ............................................................................................................... 50 ! vi ! LIST OF FIGURES Figure 1: Assessing Writing ............................................................................................... 4 Figure 2: Test Scores vs. Profeciency ............................................................................... 29 vii! ! CHAPTER 1: INTRODUCTION When students learn to compose essays in a second or foreign language, they often do so, at first, by handwriting their essays in or outside of class. As they advance in the language, they may move to a computer-format for essay composition, but at which level of proficiency that change occurs most likely varies depending on the languagelearning program, their overall computer literacy, and the differences between the students’ first or native language (their L1) and the language being learned (the L2). However, computers are the norm for essay composition in most higher-educational, academic settings. Thus, an important question is the following: if a program is using an essay test to assess academic-writing skills, should the test format comprise handwritten or computer-processed essays, or does it matter? And how does that decision relate to the test takers’ proficiency and computer literacy in the language being assessed? In general, there is a need to understand the effect that word-processing has on language learners’ essay test scores. In this study I investigate the score differences of 61 advanced versus intermediate-level ESL students, comparing their handwritten versus typed essay test scores. The learners were asked to handwrite one essay and type another, using prompts retired from a university’s English language placement test. In addition, students responded to a survey with closed and open-ended questions to investigate their views of typed versus handwritten essay tests in order to understand which they prefer and why. Using a multiple linear regression, I compare score differences across the conditions (handwriting versus typing) and between the groups (intermediate versus advanced test takers). This research is important because programs continue today to use handwritten essays to assess student-academic writing ability, which I hypothesize might 1! ! be problematic theoretically (in relation to the construct being measured) and practically, and in particular for advanced-level students used to composing essays using a word processor. 2! ! CHAPTER 2: LITERATURE REVIEW In this chapter I discuss previous research on writing assessment and computerbased versus paper-based writing assessment in particular. In the first section I discuss theories and models of assessing writing. In the second section I discuss previous literature conducted on the use of paper-based versus computer-based assessments. Third, I discuss a gap in the literature and this study’s research questions that aim to address that gap. Theories and models on assessing writing Bachman and Palmer (2010) wrote that test usefulness is the most important thing to consider when designing and developing tests, be they tests of writing or any other construct in L2 assessment. They included six factors under the umbrella of a test’s usefulness: reliability, construct validity, authenticity, interactiveness, impact, and practicality. In this section I focus most on the area of construct validity. In other words does the assessment accurately and appropriately measure the construct it is meant to. In university placement tests, the construct is typically about the students demonstrating that they have knowledge of the English language in relation to academic writing. So, for these types of tests it is important that they accurately measure not just writing, but academic writing. Because, most academic writing is done on a computer, it seems that it would be important for the test to be representative of this in order to have construct validity. 3! ! Figure 1. Assessing Writing Rubric Collect Writing Sample Prompt Train Raters Writing Assessment Score interpretation and Use Rate Reliability Check Scores Figure 1 demonstrates that there are many factors to consider when designing a writing assessment, including the topic, the time limits, the discourse mode, the genre, the writing mode (typing or handwriting), the testing conditions, rater inconsistencies, scoring procedures (holistic or analytic), and traits to be scored (language use, spelling, ect…) (Schoonen, 2005). As Gerbril (2009) found, all of these factors can affect the generalizability and validity of a test. Each factor must demonstrate some kind of validity and each must correctly contribute to the writing assessment in order for the assessment to be reliable and valid. The current study deals mostly with the potential effects of the writing mode (typing or handwriting) on students and investigates whether these two types of writing are both generalizable for the student or if one is more accurate than the other. 4! ! There are two main types of tasks used in timed writing test situations, the independent writing task, in which students are given a generic prompt and asked to compose an essay discussing their ideas about the prompt, and the integrated writing task where students are given a listening passage and a reading passage and asked to discuss the information in relation to the passages. Despite the criticisms that independent writing tasks do not accurately measure the academic writing construct, (Gerbil & Plakans, 2009; Weigle, 2002; Hamp-Lyons & Kroll, 1996) many universities use still using direct writing tasks for proficiency tests. Hamp-Lyons and Kroll (1996, p. 18) call this method a snapshot method because it takes a quick picture of a students writing ability. This is likely because integrated writing tasks require more time to both create and administer (Plankans, 2010). Also, according to Plankans, integrated writing tasks can, for some test-takers, be considered more complicated. Independent writing tasks also do not assume background knowledge and are easier to create and administer. Also, in defense of independent writing tasks, Gebril (2009) found that students do preform similarly on integrated and independent tasks. Whichever prompt they use, it is common for universities to use handwriting as the medium for composition. There are several reasons that universities, at least in the U.S., are hesitant about switching from paper-and-pencil proficiency tests to computerbased tests. One such reason is that many universities feel that since many languages do not use a Roman alphabet, language learners with those L1s may not be familiar with the English keyboard and thus may have lower scores that do not reflect their true essaywriting ability. A second reason is the demand that such writing tasks can place on the 5! ! university. Having typed essays requires the use of large computer labs, which some universities to not have access to. Studies on the use of paper-based versus computer-based assessments This effect of computers on students’ test scores seems to be a controversial topic. Researchers investigating this topic have not come to a clear consensus on whether or how computers affect test scores. Concomitantly, research that has been conducted on the differences between typed and handwritten essay tests has been inconclusive thus far. On one hand, Benesch (1987) found no difference between test scores that students received on handwritten versus typed tests. On the other hand, Lam and Pennington (1995) found that there was a positive effect for typing: that is, in their study, scores increased when students were allowed to type their essays. Lee (2004) also found a positive effect when students typed essays compared to when they handwrote them. She conducted a study looking at the quality of written products that were typed versus those that were handwritten. Lee used handwritten, typed, and transcribed essays (to control for the effects of raters seeing the handwritten script) to see the differences in the scores and if the differences were due only to the fact that they were typed. She found that typed essays, in general, were scored higher than handwritten essays, but that the transcribed essays were in some cases scored lower than typed essays. It is important to note here, however, that a majority of these studies (Benesch, 1987; Lam & Pennington; even Lee, 2004) are fairly old. Those that found that there was no difference between typed and handwritten essays (i.e., Benesch, 1987) were conducted before computers were as widely used as they are today. This means that in the last few decades, the effect of 6! ! computers on students writing may have changed. Prior studies need replicating with current, more computer-literate students. Of importance is a study by Wolfe, Bolton, Feltovich, and Niday (1996). The authors looked at the difference in scores when students were allowed to choose to either compose essays on a computer or on paper. This study investigated whether performance on a writing assessment is comparable when examinees are given the choice between composing essays using a word processor and handwriting essays. They found that students with weaker English-language proficiency did better on the handwritten exam, and students with higher English-language proficiency did worse on the handwritten exam. The findings of this study must be considered with care because the authors looked at score differences when students were able to choose their writing medium. That is, the test takers were able to select the group and writing condition to which they would be assigned; this made the study a non-randomized, between-group design. More preferable would be to compare the differences of handwriting versus typing using a randomized study (in which test takers were randomly assigned to type or handwrite). Even better than that would be to have a between-groups study design in which each student would type and handwrite essays; a researcher could then compare the results without the confound of writing-type preference. It is difficult to say what the results would be for the individual students without being able to compare the essays for the same student. Qualitative differences in students’ writing that may arise from typing instead of handwriting should also be investigated, and some researchers have looked at this. For example, Schwartz, Fitzpatrick, and Huot (1994) found that students who used word 7! ! processors were producing both longer and richer texts. Whithaus, Harrison, and Midyette (2008) looked at raters’ attitudes towards typed versus handwritten essays, and whether the types of errors produced differed between handwritten and typed essays. Like Wolfe and Manalo (2004) and Wolfe, Bolton, Feltovich, and Niday (1996), students were allowed to pick their writing medium. Raters noticed that students’ proofreading abilities seemed to decrease on typed essays. Raters also noticed more spelling and grammar errors on the handwritten essays, though the handwritten essays were still considered stronger. Raters also noticed a difference in organization; the raters noted that in general typed essays were organized better than the handwritten essays. While these results are interesting, much care needs to be taken when considering them, primarily because different students typed or handwrote the essays (each student did one or the other, not both) and they self-selected the way in which they would compose their essays. There was no real way to know if the differences were student-based (better-abled students chose to type) or if the score differences could be attributed to the writing format. Typed essays have many other advantages over handwritten essays. According to Susser (1994), the writing process itself is beneficial to students’ writing. Individuals learn to write when they pre-write, edit, and revise the writing. In timed-writing situations, like those on proficiency tests, however, it is difficult for students to have the time and the means to pre-write, edit, and revise without affecting their essays in another way. For example, Harrington (2000) stated that the appearance of the essay could have an effect on the score that the rater will give the essay. Unfortunately, in agreement with this situation, many raters in Harrington’s study claimed that how the essay looked and 8! ! how easy the essay was to read caused them to change the way they rated. One implication from Harrington’s study could be that editing on handwritten essays may be harmful to students’ scores rather than helpful, as it should be. This is because to edit on paper, students must erase or cross out the parts they want to delete and add new sentences or words, and sometimes they use arrows or other marks indicating the additions or subtractions of text. This can lead to messy-looking essays. Students are also limited by the amount of space available for editing. Typed essays may be a solution to this problem. Although learners will still have time limits on their essays, they will not have the space limitations, nor will they risk a lower grade based on the appearance of their writing. This claim that editing on handwritten essays can be harmful to students and that editing on paper essays is difficult to do is supported by Daiute (1985) who wrote that computers better allow students to fully incorporate the writing processes when composing short, timed essays. Not only do raters notice that handwriting essays can be harmful to students’ scores, but the students themselves seem to be aware of the harmful affects that can be caused by changing their paper-based essays too much. Students tend to be reluctant to make changes to their handwritten essays because of the fear of making it look less neat (Daiute 1985). Daiute suggested that students seem to be much more willing to make changes to their timed essays when they are able to write them on a computer. Li (2005) found that the number of revisions was significantly higher for computer-processed responses than for handwritten responses. Li noted that not only do students appear to be more willing to make changes to their typed essays, but they have also been found to spend significantly more time searching for words or phrases, evaluating written texts, 9! ! and have more decision-making episodes (as Li wrote, these are when students engage in pre-planning, in process planning, consider spelling, reason about linguistics choices, and search for correct words or phrases.) in computer writing. Overall, Li’s study showed that participants were able to revise at higher levels when using a computer compared to using paper. Another consideration that needs to be made when considering whether students’ writing proficiencies should be tested using a computer-based test or a handwritten one is authenticity. For a proficiency test to accurately measure a student’s academic writing ability, it is important to take into consideration the authenticity of the assessment. Authentic assessments are those that represent how students would be expected to perform in real-life situations. Bachman (1990) and Bachman and Palmer (2010) claimed that authenticity plays an important role that works together with construct validity to determine how the construct definition and the domain of generalization will affect the way in which a test score will be interpreted. Over the past few decades, using authentic assessments is becoming more and more important and central to language learning (Douglas, 2000; Lewkowicz, 2000). Because most writing in university and professional settings is done on computers, it is important that the way in which a student’s writing ability is measured is representative of how they will need to perform in the real world, or in this case, the university world. With the emphasis on computers today, it seems more realistic for students to perform writing tasks, especially those determining their academic proficiency, on a computer rather than with a pencil and paper. Lee (2004) agreed that computer-based tests allow 10! ! for a more authentic writing experience, and thus an increased chance of success for students who are academically prepared and able. Finally, studies have looked at learner perceptions of typed versus handwritten essays. Many have found that most students feel more comfortable typing rather than handwriting essays. This is most likely due to the fact that, in their home countries, most students type their assignments. Lee (2004) reported that students in her study said they believed that the computer-based-writing medium was more likely to place them into the correct ESL class. This is especially important in large-scale, university placement tests because there are many more problems with students when they do not feel that the placement test could accurately place them into the correct level. If nothing else, Lee’s study showed that students find computer-based tests have more face-validity than paperbased tests. Students also reported that they preferred typing to handwriting essays because they feel typing is more familiar, produces more legible text, and is convenient. Students also reported that their errors were easier to spot in a typed essay over a handwritten essay (Whithaus, Harrison, & Midyette 2008). While previous studies have looked at the differences in scores between paperbased and computer-typed writing assignments, no study has yet looked at the effect of proficiency on the differences in scores between paper-based and computer-based writing assignments. In the current study I will look at score differences across typed essays versus handwritten essays and in comparison to proficiency scores. This will help me determine if there is a positive or negative effect depending on proficiency and also to determine if this effect would be significant enough to change the placement of students in English language classes. 11! ! The gap in the literature and this study’s research questions that aim to address that gap As reviewed above, the research topic of how the writing context (handwritten versus typed) affects essay-test outcomes is not new. Several studies (Lee 2004; Lam & Pennington 1995; Wolfe & Manalo 2005) have looked at differences in scores between handwritten versus typed essays, but these studies’ results were limited in scope. Few of them compared scores between the same students (Wolfe & Manalo, 2005; Wolfe, Bolton, Feltovich, Niday, 1996; Whithause, Harrison, & Midyette 2002) (these studies used between-group designs, with one group producing handwritten essays, and the other group producing typed essays; scores across groups were then compared. However none looked at the differences in the scores as related to the proficiency levels of the students while comparing scores for the same student. Moreover, none looked at computerprocessed essays; instead, the researchers whose work is discussed above investigated computer-typed versus handwritten essays. I make this distinction because computerprocessing is different from typing on a computer in that when a student computerprocesses an essay, he or she may use tools that allow for copying text, moving it to another part of an essay, and deleting text (King, Rohani, Sanfilippo, & White 2008). If using a program such as Microsoft Word, the writer may check and edit, to a certain extent, his or her basic grammar and spelling using Word’s grammar and spell check features. Thus, the present study compares computer-processed (using Microsoft Word) and paper-based scores within the same student while also making comparisons among students with different proficiencies. This is important because in the real world’s academic context, students do not just use a computer to type their academic work; they 12! ! use computer-processing tools, including spell checkers and grammar checkers. Thus, to better approximate the real-world construct of academic essay-writing, I believe tests of academic essay-writing should allow students to use processing tools that are available to them (and that current students use) in the real context. The effect of this method of essay-composition (computer-processed essays) in relation to paper-and-pencil-produced essays has not yet been investigated. A logical step is to do so. The following research questions will be investigated: 1. Do the results of a placement test differ between a paper-and-pencil based test and a computer-processed test 2. Are there specific areas of writing (measured via scores from the analytic rubric) that seem to improve with one of the writing mediums? 3. Do these results differ when proficiency is taken into consideration, that is, (a) for low-level test takers and (b) for higher-level test takers? 4. Do test takers feel more comfortable computer-processing or handwriting their essay tests? In relation to research question 1, I hypothesize that there will be no significant difference between paper-and-pencil-based tests and computer-processed tests. I believe that for research question 2 there will be difference based on category looked at as described on the rubric. I also draw on research by Whithaus, Harrison, and Midyette (2008) when they said that raters commented on increased mechanical and spelling errors in typed essays rather than handwritten ones, however, overall they found that the essays increased in quality. I also draw on research by Lee (2004) who found that scores increase in the areas of organization and content. Thus I believe that students will have 13! ! increased mechanical and spelling errors on the typed essay compared to the hand written essay and that their scores will go up in the areas of organization and content on the typed essay. Also, like Whithaus, Harrison and Mdyette (2008) found I believe that there will be increased mechanical errors in the typed essays. I thus hypothesize that raters will perceive typed essays as better for high proficiency students but will notice more errors in the writing of lower proficiency students. In relation to research question 3, I hypothesize that low-level test takers will perform worse on the placement test when they must use Microsoft Word due to their unfamiliarity with the English keyboard. On the other hand, I hypothesize that higher-level test takers will perform better on the computer-processed test because they will be more familiar with computers, the English keyboard, and in computer-processing their English-language essays. For research question 4, I have the following hypothesis. Based on results from Lee (2004), I expect most students to be more comfortable with typing over handwriting essays. When divided by proficiency, however, it may be that lower-level students will be less comfortable using a computer to compose essays while students of a higher proficiency may prefer it. 14! ! CHAPTER 3: METHODS In chapter four I discuss the materials used and the data collection procedure. I will conclude this chapter with a discussion of the ways in which the data were analyzed. Participants Participants of this study consisted of 61 ESL students recruited from the lowintermediate level of an intensive English program (IEP) as well as advanced students from an English academic program (EAP) at a large Midwestern university in the United States. It is worth mentioning here that there are two additional levels between the low intermediate and the advanced level students. Students in the same university’s Master of Arts in Teaching English to Speakers of Other Languages also participated. The IEP program is intended for students who do not meet the TOEFL score requirements to be enrolled in an academic program, as well as students who want to further their English study and do not plan to enroll in academic classes in the future. The EAP program is meant for students who have completed the IEP program and consists of a series of four courses of which students may take only the specific skills that they need and can take academic courses at the same time. The Master’s in Teaching English as a Second Language program is a regular master’s program, and the students in this group did not take any classes at the English language center before being accepted to the university. To recruit participants I visited the classes to explain the project and pass out a sign up sheet where possible. When I was unable to visit a class, I gave a flyer to the teacher so students could sign up to meet at one of six specified time slots. A total of 61 students participated in this study. An exact breakdown of the participants can be seen in 15! ! Table 1. As can be seen in this table a majority of the participants were native Chinese speakers and between the ages of 18 and 21. Table 1. Participants' Backgrounds Number of Participants Average Age Average Age of Starting English Level of Study IEP EAP Master’s Student Native Language Chinese Korean Bhasa Arabic Spanish Male 29 21 11 Female 32 22 10 Total 61 21 11 14 13 2 10 16 6 24 29 8 21 0 0 7 1 28 1 1 2 0 49 1 1 9 1 Materials For this study I used two retired prompts (Appendix A) from an English language test proficiency test at a large Midwestern university. Both the instructions and the format of the placement test were followed as closely in order to keep the task as similar to the actual test as possible. The differences are that in the actual test students would only have to write one essay and they would be required to handwrite the essay rather than type it. Aside from the essay prompts, I also included a background questionnaire (Appendix C) that asked participants information about their age, sex, language learning background, and time at the University. I also employed an exit questionnaire (Appendix 16! ! D) consisting of eight items, five Likert scale items (four of which asked for additional explanation), and three two-choice items that asked for additional explanations. The exit questionnaire asked about the students’ perceptions of each task and about their satisfaction with their original level placement. Finally, the raters used the original rubric (Appendix E) that had been developed and used for the placement test at the university. I chose this rubric not only because it is the actual rubric used to rate the proficiency test, but also because it has several other benefits. The rubric is an analytic rubric consisting of different categories that are scored independently and added together. An analytic rubric is useful because it reflects the different aspects of a test takers writing ability (Weir, 2005) and gives a fuller picture of the differences between the essays due to the variables. Knoch (2009), Weigle (2002) and Bachman and Palmer (1990) also pointed out that analytic rubrics tend to be more reliable and show a greater range in score differences. Procedure Study Approval The university IRB approved the study prior to the start of data collection. The IRB’s Policy required every participant to sign a consent form stating the purpose of the study and the risks, benefits, means of ensuring privacy, and the procedures associated with it. Therefore, the students signed a copy of the consent form at the beginning of the session. I passed out and went over the consent form with participants answering questions as they arose. I gave the students a copy of the consent form to take with them after completing the task. Setting 17! ! Testing took place in a language learning computer lab. The lab was equipped with 36 iMac computers running Mac OS X Snow Leopard. Computers were arranged in six rows, each with six computers in it. Seats were arranged so that participants would be facing the front of the room when seated at their computers. Participants sat at every other computer to ensure that there was no cheating. At the front of the room there was a teacher’s station connected to a projector, which displayed the amount of time remaining in the test. Computer Set-up Each computer had one Microsoft Word document open on it. The document was blank and students were instructed to type their name and the date at the top of the document. No other windows or programs were open, and students were instructed not to touch anything else on the computer but the Microsoft word document. Students were allowed to adjust the size of the document to ensure they could see it and work with it comfortably. Data Collection Participants came to the computer lab on the day and time they signed up for and were randomly placed into one of four experimental groups. Each group received their first essay at the same time; however, one group was handwriting while the other group was typing. The method of production, handwritten or typed, was switched for the second essay. In order to keep the essay format as similar to the placement test as possible, students had 35 minutes to write an essay on each of the topics. The experimental groups were arranged in these four conditions: 1. Type Essay A, Write Essay B 18! ! 2. Type Essay B, Write Essay A 3. Write Essay A, Type Essay B 4. Write Essay B, Type Essay A Participants were also asked to fill out a survey on their opinions of the two different tests. The expectation of the survey was that it would give a clearer picture of test takers’ comfort levels with the two different test styles and whether they are more comfortable typing or handwriting essays. When they arrived, participants were assigned a computer. When all of the participants were present the researcher passed out a packet to each student. The packet contained all of the pages needed for the experiment. The researcher then explained the consent form. Once participants had signed the consent form and it was collected the researcher asked the participants to fill out the background questionnaire. The background questionnaire was completed before the start of the writing task. Once all of the students had completed the background questionnaire the researcher collected the questionnaires and then explained the directions for the next part of task. After all questions had been answered and the researcher had ensured that the directions were clear the participants began the first writing task. The first writing task was hidden under a piece of white paper in the packet. Participants were instructed not to flip past the white paper until they were instructed to do so. After the instructions were explained the researcher told participants to flip over the white paper and the timer started. After the 35 minutes had passed, participants were instructed to stop and the researcher saved each document for the students. Participants 19! ! were then given a 10-minute break and the opportunity to have drinks and snacks. This was done in order to prevent any test fatigue that may occur. After the 10-minute break participants began the second writing task. This task was conducted in the same way as the first, the only difference being that students who had typed the first essay would now be handwriting and vice versa. Finally, after the second task participants were asked to complete the exit questionnaire. After they had finished answering the questions on the exit questionnaire the researcher checked to be sure all materials had been collected and then participants were allowed to leave. Essay Rating I asked two trained and experienced raters of the English language proficiency test used in this study to rate each essay. The raters scored the essays based on the essay rubric from this specific proficiency test. The raters were the same as those who rated the official test at this university and they volunteered to assist with the study. The raters’ scores were averaged to determine each participant’s final score. Rater one was asked to rate all of the typed essays first, followed by the handwritten essays. Rater two was asked to do the opposite and rate the handwritten essays followed by the typed essays. Because Lee (2004) found that transcribed essays were rated lower than their handwritten counterparts, the handwritten essays were not transcribed. Since the raters are used to rating handwritten essays and are both experienced raters and because Powers, Fowles, Farnum, and Ramsey (1994) and Russell and Tao (2004) stated that raters tend to rate handwritten essays higher than their typed counterparts, I believed that there would not be a negative rater bias due to the appearance of the handwritten essays compared to the 20! ! typed essays. In addition, I wanted to replicate how real testing programs have handwritten essays rated: They do not transcribe them. Raters rate the original or a photocopy of the handwritten text. Analysis I used quantitative methodology to address the three research questions. Quantitative data consists of the raw scores collected from scoring the essays with the analytic rubric as well as the averages of scores found on the Likert scale items on the questionnaire exit questionnaire. I used IBM SPSS 20 software to preform statistical analyses of the quantitative data. Trends in the qualitative data will be investigated and analyzed. RQ 1 and 2: I addressed the first and second research questions by using paired samples t tests to investigate the raw scores given to each category on the rubric and to compare those scores between typing and handwriting. RQ3: I addressed the second research question using a multiple logistical regression to see if the difference in the scores was correlated to the proficiency of the students. I also performed a correlation to see if there was a true effect of writing medium for the students overall and in each proficiency group. RQ 4: I addressed the fourth research question using Likert-scale items to see which writing medium students preferred. I also used a logistical regression to see if any other 21! ! factors (students preference for typing versus handwriting or L1) could predict a student’s success on handwriting compared to typing essays. 22! ! CHAPTER 4: RESULTS The purpose of chapter four is to present the results of the data analysis for each of the four research questions. In this chapter I will discuss each research question and present the results for each question. Before presenting the results, I first display the reliability of the scores on the tests used in this study. In Table 2, the inter-rater reliability of the subsections of the analytic rubric are displayed, along with the overall reliability of each type of test. Table 2: Rater Reliability Handwritten Typed Content 0.68 0.73 Organization 0.64 0.58 Vocabulary 0.71 0.56 Language Use 0.71 0.75 Mechanics 0.50 0.45 Total 0.74 0.78 Note: Values refer to Pearson correlation coefficients Table 2 shows that the rater-reliability overall is fairly high. For the handwritten condition the raters agree 74% of the time while for the typed condition they agree 78% of the time. The rater-reliability among some of the individual categories, particularly the mechanics category, are low. Mechanics has the lowest rater reliability with 50% for the handwritten condition and only 45% for the typed condition. Research Question 1 and 2 The first research question investigated whether there were score differences between the typed and handwritten essays among the entire population. To investigate this I conducted a paired samples t test on the values of each rubric category and 23! ! compared them between the typed and handwritten condition. Table 3 shows the descriptive statistics of the data. Table 3: Descriptive Statistics for Writing Medium Content Organization Vocabulary Language Use Mechanics Total Note: n=61 Typed Mean SD 21.46 2.62 16.07 1.95 12.81 1.74 21.54 2.40 3.97 0.60 75.69 8.76 Handwritten Mean SD 22.22 2.40 10.8 1.20 13.3 1.83 22.37 2.43 3.81 0.59 72.50 8.11 In Table 3 it can be seen that for the overall score, the typed condition is scored slightly higher than the handwritten condition. The content, vocabulary, and language use subcategories are lower for the typed condition, and the organization subcategory is scored higher than on the typed condition compared to the handwritten condition. In order to investigate whether these means are significantly different, I preformed paired samples t tests for the overall scores as well as for each of the subcategories. These results can be found in Table 4. Table 4: Paired-Samples t-test Category t-value df Content -3.42 60 Organization 5.63 60 Vocabulary -0.11 60 Language Use -0.42 60 Mechanics -0.17 60 Total 1.77 60 Note: * = Significance at the .05 level p 0.00 * 0.00 * 0.01 * 0.00 * 0.84 * 0.00 * 24! SE 0.22 0.18 0.19 0.20 0.08 0.71 Effect Size (r) 0.40 0.97 0.31 0.47 0.03 0.50 ! Participants overall received a higher score in the typed (M=75.69, SD=8.76) condition compared to the handwritten (M=72.50, SD=8.11) condition, t(60)=-1.77, p=0.00, r=0.50. The organization category was also scored significantly higher between the typed (M=16.07, SD=1.95) condition and the handwritten (M=10.80, SD=1.20) condition, t(60)=5.63, p=0.00, r=0.97, and this effect size was extremely large. The effect sizes are, in general, medium to large, with organization scores being largely impacted depending on the medium of essay composition: when typed, the scores were significantly and meaningfully higher (Cohen 1988, 1992). However, participants received lower scores in the areas of content in the typing condition (M=21.46, SD=2.62) compared to content in the handwritten (M=22.22, SD 2.40) condition, t(60)=3.42, p=0.00, r=.40. The same occurred with vocabulary in the typing condition (M=12.81, SD=1.74) compared to the handwritten (M=13.3, SD=1.83) condition, t(60)=-0.11, p=0.01, r=0.31. Likewise, language use in the typed (M=21.54, SD=2.40) condition compared to handwritten (M=22.37, SD=2.43) condition was significantly lower, t(60)=-0.42, p=0.00, r=0.47. The mechanics category did not show any significant difference in the typed (M=3.97, SD=0.60) condition compared to the handwritten (M=3.81 SD=0.59) condition. In order to help understand why many of the categorical scores were lower for the typed condition compared to the handwritten condition I did word counts on each essay and compared them between the two conditions. This can be seen in Table 5. 25! ! Table 5: Word Counts by Level Typed Low Intermediate Advanced Total Mean 244.75 392.00 454.50 342.30 Handwritten Mean SD 249.46 64.55 358.59 69.17 403.50 73.70 321.50 90.00 SD 79.37 83.93 76.30 114.28 Table 5 shows that for the Intermediate and advanced students the word counts were higher on the typed essay compared to the handwritten essay. For the low students, however, the handwritten essays had higher word counts than the typed essays. Research Question 3 The second research question investigated whether the differences in scores varied depending on the proficiency of the participants. To investigate this I used a multiple logistic regression. In this multiple logistic regression, I predicted students’ benefits in typing: in other words, in this regression analysis, I used gain scores (each student’s typed essay score minus his or her written-essay score) as the dependent variable. I wanted to see which of the following independent variables would be associated the gain scores between written and typed essays: • L1 background (Chinese or Arabic only—I did not investigate other backgrounds because there were not as many participants with those backgrounds) • The test-takers’ preference for typing over handwriting (with a preference for typing scored as a one, and a preference for writing scored as zero) • L2 proficiency (intermediate or high) The results of this regression can be seen in Table 6. 26! ! Table 6: Regression Factors and Their Effect Factor B Std. Error Beta Sig L1 Chinese 0.89 3.15 -0.06 0.78 L1 Arabic 4.13 3.74 0.25 0.23 Preference 0.08 0.37 0.03 0.84 Intermediate 1.44 1.67 0.13 0.40 High 7.28 2.33 0.45 0.00* Note: * = Significance is at the .05 level To recap, L1 Chinese, L1 Arabic, Essay Preference, Intermediate Level, and High level were used in a stepwise multiple regression analysis to predict students essay-score increase in a typed condition compared to a handwritten condition. The correlations of the variables are shown in Table 4. As can be seen, all correlations except for the one that involves being a high level student were not significant. Table 6 shows that there is no significant difference between the increase of scores by typing and the intermediate group, B=1.44, SE=1.67, P=.40. However, for the high level students there is a significant difference between the two test conditions and a significant gain in typing, B=7.28, SE=2.33 P=0.00. Students with a high level of proficiency gain around 7 points on the typed essay compared to the handwritten one. In order to further understand these differences and to get a clearer picture of them I preformed a correlation on the scores by level as well as overall to see if there were high or low correlations. These results can be seen in Table 7. 27! ! Table 7: Correlation of Scores Low Intermediate High Total Correlation 0.67 0.35 0.35 0.79* Significance 0.00 0.07 0.39 0.00 As would be predicted, the overall correlation is high, however when broken down by level the correlations are slightly different. For the low students the correlation remains fairly high at .67. So, for these students the writing medium makes very little difference in terms of how the students are ordered by rank. For both the intermediate and high students, however, the correlation is fairly low at .35 and .39 respectively. This shows that for both of these groups of students the writing mediums are not comparable. The low correlations indicate that test takers are ranked differently depending on the writing medium. At the two upper levels, some students receive higher scores when typing, while others receive higher scores when handwriting. The low correlations indicate these differences at the individual level. The low correlations also indicate that the two writing modes may be measuring different writing constructs. (I will revisit this notion in the discussion section.) 28! ! Figure 2: Test Scores vs. Proficiency 90! Test%Scores% 85! 80! Typed! 75! Handwritten! 70! 65! Low! Intermediate! High! Level% Figure 2 shows a visual representation of the mean scores by students in the two different test conditions. As was found in the regression, students’ scores were not significantly different until they were at the advanced level of proficiency; that is at a higher level of proficiency, the students’ scores on the handwritten test ended up being dramatically (and significantly) lower than when they typed. Research Question 4 With the third research question I investigated how students felt about typing essays compared with how they felt about handwriting them. The means and standard deviations for the students’ preferences are reported in Table 5 below. 29! ! Table 8: Student Perceptions: Typing versus Handwriting Low Intermediate High Typed Mean SD 5.67* 0.991 6.19* 2.11 6.88* 1.84 Handwritten Mean SD 5.22* 1.25 4.67* 1.92 3.88* 1.98 Note: *= p=.000, Measured on a Scale from1-8 Table shows that students overall preferred to type essays rather than to handwrite them. The size of this difference becomes greater as the proficiency of the students goes up. Lower proficiency students only have a slight preference for typing while High proficiency students have a more dramatic preference for typing. Table 8 shows the percent of students who thought their handwritten score would be scored higher than their typed score. Table 9: Student Perceptions: Which Essay Will Be Scored Higher Low Intermediate High Handwritten 40 39 43 Typed 57 61 57 As is demonstrated in table 8 more than half of the participants believed that their typed essays would be scored higher than the handwritten essays. This is true of all three of the proficiencies. 30! ! CHAPTER 5: DISCUSSION AND CONCLUSION In chapter five I examine the results presented in chapter 4 with reference to each of the research questions. I then continue with the general and pedagogical implications of this study. Finally, I conclude with a discussion of the limitations and suggestions for future research. With the increased use of computers in the United States and around the world, it has become increasingly important to understand the effect that computer-based assessment has on writing tests. In education programs, computer processing has become the ideal way for students to compose their writing assignments, and because of this, handwritten essays have become nearly obsolete in mainstream, academic courses on college and university campuses. Although this is the case, many proficiency and placement tests, even those offered at universities, still employ the use of paper-andpencil writing tasks to assess students’ academic writing proficiency. This is because many test developers believe that having students handwrite essays eliminates a possible bias for students who are used to typing on English keyboards over students who are not. The differences in the way proficiency tests assess writing and the way writing is assessed in university classes has created the need to understand if students of varying proficiency in the language being assessed are able to write equally well with and without a computer. Categorical Changes In this study I had 61 English-language learners compose two academic English essays: one that they typed using the computer-processing program Microsoft Word (with all Word processing features available to the writers), and one that they handwrote (with 31! ! no dictionary or outside help). In response to the first research question, which was “Do the results of a placement test differ between a paper-and-pencil based test and a computer-processed test”, I looked at whether there were significant differences in the overall test scores and in the analytic rating categories between the two different conditions. Paired sample t tests showed that there is an overall improvement in the typed condition compared to the handwritten condition. These results agree with Lee (2004) and Lam and Pennington (1995) who also found an increase in scores on typed essays. As Lee (2004) found, the differences in this study seem to occur mostly in relation to the organization category (as defined on the rubric), which includes the sequencing of the essay, use of main ideas and supporting ideas, and cohesion devices. This category was the only positive categorical change found in the typing condition over the handwritten condition. This finding is also supported by Whithause, Harrison and Midyette’s (2008) study where raters commented on the poor organization in handwritten essays compared to typed essays. In sum, the higher scores on organization for the typed essays resulted in (or contributed most to) the overall higher total score on typed versus handwritten essays. This difference could also be due to the fact that, overall, students write more in the typed condition than in the handwritten condition. Because they are writing more they may also be writing faster, and that could lead to this score drop in some of the categories, particularly language use. This improvement in organization in the typed essays could be argued to be the result of a few different factors. First, it is easier to move text around on a computer than it is on a handwritten essay. The ability to easily move text using a computer processor may influence a student’s willingness to change his or her mind and actually move the 32! ! text. Shaw (2005) suggested this was the case when he wrote that the ability for a student to use word-processing tools might aid in the development of their essay. The second factor contributing to this increase could be the neatness of the text on a computer: it might be easier for students to visualize the organization of the essay. Also, the fact that more words fit on one typed page than on one handwritten page may also contribute to test takers’ ability to see the organization of the essay better. This computer-aided, bird’seye view of the essay might enable students to be able to make corrections to it more easily. Essay organization in writing is a difficult task especially for second language learners and typically requires one to write and move and rewrite before finding an acceptable organization (Whithause, Harrison & Midyette). It seems fair to argue that the computer helps to solve this problem. Three of the categories (content, language use, and vocabulary), had moderate, but significant, negative effects in the typed-essay condition. In other words, when the students handwrote their essays, they got moderately higher scores in the categories of content, language use, and vocabulary. This could possibly be due to rater bias as discussed by Fowels, Franum, and Ramsey (1994) and Shaw (2005). In both of these studies the researchers suggested that raters feel more sympathy for students who are handwriting essays over typing; that is, the raters feel that test takers put forth more effort in handwriting. Thus, it may be the case that that the raters were allowing for more errors in the handwritten essays than were allowing in the typed essays. The raters in this study were tasked with rating both the handwritten and typed essays (though the rating was counterbalanced; one rater rated all typed essays first, while the other rated all handwritten essays first), but these score differences may reflect a general trend of the 33! ! raters to, when possible, use a higher range of the rating scale when rating the handwritten essays. This will be discussed more in the limitations section. This is also consistant with the findings of several researchers who notied in their studies that test takers seemed to be paying less attention to content in their word-processed essays and instead were making changes at the word and sentence level. (Bridwell, Sirc, & Brooke 1985; Bridwell-Bowels, Johnson, & Brehe 1987; Collier 1983). Out of the five categories on the rating rubric, only the mechanics category was not significantly different between the two writing mediums. Looking at Table 2, one can see that the two groups’ scores on mechanics are very similar. This finding may suggest that a computer really cannot help a student with the technical aspect of essay mechanics. The student either knows where punctuation belongs, knows how to create different sentence structures, spell, and capitalize or doesn’t. This finding is slightly contrary to, Whithause Harrison, and Midyette’s (2008) results. They wrote that mechanical errors seemed to jump off the page in the typed essays and were not as noticeable in handwritten essays, thus the mechanics category was thought to be scored lower when the essays were typed in their study. This finding also contradicts the claim by Withause, Harrison, and Midyette (2004) that students seemed to lose the ability to proofread when essays were typed. In their study, raters commented that the mechanics of students’ essays were worse on the typed essays compared to the handwritten ones. However, in this study, I found that these categories were similar between the two different writing mediums, and thus this claim does not hold when considering the present study. Another interpretation of the results could be that in fact the students’ mechanics were worse in the handwritten condition, but the raters were biased toward giving the handwritten 34! ! essays a higher score on mechanics. This too will be discussed more in the limitations section. The Effects of Proficiency, L1, and Preference Using simple regression, I looked at whether English language proficiency, L1 background, and test takers’ preference for handwriting or typing essay-test exams were associated with the overall scores obtained on the essay tests. In particular, for each individual, I calculated his or her gain score in typing over handwriting: that is, I subtracted each test taker’s score on the typed essay minus his or her score on the handwritten essay (and all the subcategories represented by the different categories on the analytic rubric) to see if proficiency, L1, or their preference predicted gains in typing. The test takers’ native language and essay preference did not have a significant effect on test score gains. L1 Chinese test takers did not do better than any other L1 background when typing essays, and L1 Arabic speakers did not do any better on typed essays than any other L1 background. I did not investigate the effect of the other language backgrounds on gains in typing over handwriting because of the low number of participants with those language backgrounds. But overall L1 background did not contribute to ability to type over handwrite. Likewise, test takers’ preference for typing was not associated with higher scores on the typed versus handwritten essay tests, which I believe is a very significant finding in this research. Past research has shown that when students are allowed to pick, before writing, whether to handwrite or type, lower proficiency students do better on the handwriting and higher students do better on typing (Wolfe, Bolton, Feltovich, & Niday 1996). As discussed in the literature review, this finding is interesting but problematic, in that 35! ! having students chose their writing medium results in a study with non-randomized group assignments. Wolfe et al. were not able to attribute the differences in the test scores within the groups to (a) the test-taking condition or (b) the choice that was made by the test takers. In this study I found that student preference for one type of essay over the other type of essay does not predict their ability to type or handwrite an essay, and even the lower level students did better on the typed test. This finding may suggest that students cannot accurately pick the writing medium in which they will best perform. This could be important for testing programs that offer test takers the option of handwriting or typing an essay test. If a student cannot accurately chose whether they are better at typing or handwriting, then this could be a problem if they are offered an option in a writing test and they chose, for comfort reasons, the option that results in lower test scores. It could be that a student feels they will write better by hand when this is not the case, or they may believe that raters prefer typed essays, even if this is not true. Such decisions could lead to a student not scoring as well on a test than he or she might otherwise. Finally, in this study I found that the proficiency of a student does have an effect on the scores the students receive on typed essays compared to handwritten essays. For the low and intermediate students there is no significant difference between scores on the typed essay compared to the handwritten essay. In both cases, the students performed slightly better on the typed essays. However, advanced students received a significantly higher score on the typed test than the handwritten test. They also preferred typing over handwriting giving it on average a score of 6.88 out of 8 on the Likert scale whereas they gave handwriting a score of 3.88 out of 8. Advanced students scored an average of seven 36! ! points higher on the typed essay than on the handwritten essay. This is a significant and large score difference that can cause a student to be placed into an English language class merely because that student was required to handwrite rather than type the placement test essay. There are several possible reasons for this difference in scores for advanced students. The first is that advanced students, those with bachelor’s degrees, may be used to composing academic essays on a computer. Many universities around the world require students to compose essays on computers, and these students may have much more experience typing over handwriting. Also, because of general college and university typing requirements, students may have stopped handwriting assignments long ago, and thus might be very uncomfortable handwriting an essay. Second, students, especially at this level, who may be quite comfortable with composing essays on computers, may also know how to use the functions available in word processors such as cut, copy, and paste. They may take advantage of spelling and grammatical checking features. They may rely on these functions for organizing and planning their essays and for checking their syntax, spelling, and grammar. Taking these functions away could make it much more difficult for students to compose essays. Writing by hand has been said to make it difficult to move and change things in an essay (Lee, 2004). Research has shown that when students have to hand write essays, they have to actually plan the organization before writing the essay. If students are used to computer-processing an essay and not worrying about the essay’s organization until after the essay is composed, then may be difficult for the students to pre-plan when handwriting. Any lack of knowing how to strategy shift in organizing the essay 37! ! depending on the writing medium could result in organization-score differences, and as this study suggests, a poorly organized handwritten essay. Powers, Fowles, Farnum and Ramsey (1994) and Russell and Tao (2004) noted that handwritten organizational edits do not have an effect on raters, but studies also have found that students think that they will (Lee 2004). Thus, when handwriting, students might be reluctant to make organizational changes after the entirety of the essay is composed. But when computer processing, writers may be more apt to make organizational changes after the initial essay is drafted. The correlations that I preformed further support these findings. The correlations showed that for students in the intermediate and high groups, the writing medium does make a difference. For many of the upper-proficiency-level students, their writing score changed depending on the writing medium they used. The low correlations within the upper-level groups (between their typed and handwritten essay scores) indicate that even though on average students receive the same scores, individuals within the groups perform differently depending on the writing medium. This is further proof that typing may be better for these students than handwriting because typing matches the academic mode of writing being assessed. For the low students, there is no difference in scores and the scores correlate, so it does not matter which writing medium they use as they will score about the same. Student Perceptions Regarding the third research question, none of the three proficiency groups (low, intermediate, or advanced) preferred handwriting over typing. Of course, this is a generalization as individual differences among students were not considered in the 38! ! current study. On average, students at all levels have a preference for typing over handwriting. However, at the low level, this preference is slight, whereas at the advanced level, this preference is much larger. As was expected, the higher the level of the student, the larger the preference for typing over handwriting. When asked which essay will be scored higher, students’ responses showed that for all three levels, over half of the participants believed that the typed essay would be scored higher than the handwritten essay. This is most likely due to the fact that, as Withause, Harission and Midyettte (2008) found, students tend to believe that typed essays are more legible than handwritten essays and thus will receive a higher score than their handwritten counterparts. This sheds light on what may happen when programs allow students to choose to handwrite or type on academic essay tests: some test takers may choose based on their true preference (as the test developers intend), while others may choose based on what they think raters expect or will like better (not as the test developers intend). Thus, in some cases, the choice may introduce underestimations of student performance. In the worst-case scenarios, students may perform poorer than expected (get lower scores than they should) because they chose one medium over the other, with each student making that choice based on different (apparently random) reasons. Implications Concerning research on score differences between handwritten and typed essays, this study has several implications. Proficiency seems to play the biggest role in determining whether typing an essay can help or harm students. For the low and intermediate levels there is no significant difference in the final essays scores. However, 39! ! for the most advanced students, handwriting actually has a harmful effect. This means that a student who is very advanced and forced to handwrite an essay may end up with a lower score than they would if they typed the essay. The score could be so much lower that they may have to, in certain places, take English language classes because of it and could lose time, scholarships, and admittance to a university. It seems that switching from a handwritten proficiency test to a typed test would help students more than hurt students. This is even true at the lower level. This study does not look at novice or true beginners of the language being assessed; however, it is important to keep in mind that those levels typically have separate writing tests because a general test of academic-essay-writing can typically not tease apart students at the lowest levels of language proficiency. Indeed, when considering very low-level language learners, giving them a test to measure their ability to perform academically in the language would seem absurd; for example, they should be given a general English-writing test, not an academic-English-essay writing test. Also of importance are the results found on the preferences of the students. English language programs everywhere tend to have some problems with students believing they were unfairly tested and placed into the incorrect level or class. In this study I found that the majority of students believed that a typed essay test will place them correctly. Thus, it may be beneficial for such programs to switch to a typed test and thereby eliminate students’ blame on handwriting as the cause of any misplacement. The results of this paper point to the benefits of having test takers type their essays for academic-essay tests. This finding is especially helpful because having students type in such testing situations would better match the actual, real-world tasks of 40! ! academic writing. Bachman and Palmer (2010) and others (Douglas, 2000; Lewkowicz, 2000) stressed that tests must represent (as well as possible) how the skills being tested are utilized in the real world. In academic settings all over the world, individuals type their written work and computer-processes their academic essays. Limitations Although the data in this study is very informative, there are still a few limitations to this study in both the population used and the study itself. The first limitation to this study was the final number of the most advanced students used in the study. Because this is where the largest difference was found, it would be useful to have more students of this level in order to better understand the effect of the test condition and the test-taking behaviors of these students. The second limitation is the participants themselves. Because the program in which I conducted this study was heavily populated by Chinese and Arabic speaking students, it was difficult to get students of varying backgrounds and L1s. Not only was there a lack of cultural diversity, there was also a lack in age diversity. Most students were around 18 or 19 years old. While this is not likely to have a large impact on the results, it would still be an interesting area to look for an effect in both age and native language. Another limitation could be the effect of spell checkers on the essays of the students who are familiar with computers. Due to the use of Microsoft word, there was no way that this could be controlled. This could offer a slight advantage to those students over the ones who are not as familiar with composing essays on computers. I believe that this would not actually affect the overall scores, but in hindsight I could have better 41! ! controlled this factor by asking questions that tapped into students’ knowledge of MS Word computer-processing tools. Finally, a limitation is that I did not interview raters about the way they rated the essays. This is a limitation because it would have offered many insights into the differences in the way raters rated the essays and exactly what they thought about while rating each type of essay. In a future study this could be done through think alouds with each of the raters. Directions for Future Research Finally, there is additional research in this area that could be done to help further understand this area of research. It could be helpful to expand this study to include the lowest-level students offered in the program. Adding this population would show whether it would be better to offer a dual option test or if it would be appropriate to completely switch to a typed format. Second, it would be helpful to have a larger number of the highest-level students to get a clearer picture of what is happening with students when they type compared to when they handwrite. This would provide further support for the current study and help to further inform university testing centers on which type of test is the best to use. Third, it would also potentially be interesting to look at students with other, different native languages (other than Arabic and Chinese) and see if there is any difference in their scores depending on the language background they come from. Data from such populations may help to inform whether an optional typing and computerprocessing class should be offered to language learners in programs that promote academic-language and academic-skills development. 42! ! A fourth possibility would be to look more closely at rater differences in rating typed compared to handwritten essays. While Powers, Fowles, Farnum, and Ramsey (1994) and Russell and Tao (2004) wrote that raters tend to rate handwritten essays higher than their typed counterparts, this may not be the case in a real rating situation. In both of these studies the raters were trained to rate the specific essays they were given. Lee (2004) discussed the possibility that transcribed essays are usually scored lower than their handwritten counterparts. It would be interesting to look at how raters actually perceive these different types of essays, possibly through using think alouds to understand further what raters are thinking and to see how different the ratings actually are when essays are typed compared to handwritten. Another possibility for future study would be to look at the specific differences that occur for students when they are typing compared to writing. It would be interesting to see the changes that they make while they are typing on a computer as well as the changes they make when handwriting an essay. This would show further what the qualitative differences are for students. Such a study would allow researchers to understand more concerning any differences between the handwritten and typed conditions. It would also be interesting to, in a future study, to give an objective measure to mechanics. This could be done through coding, grading, and counting the mechanical error in the essays in order to see how well the raters were doing. It could also help researchers to know more about rater bias and where it comes from. Finally, it would be interesting to look more closely at students’ preferences and the motivations behind any choices they make when they have the ability to decide 43! ! whether to type or handwrite academic essays. This would allow teachers and researchers to better understand how students feel about these tests and to further understand how they view a typed test compared to a handwritten test. Understanding the students better may help programs not only to explain to students why tests are done the way they are, but it could also help pedagogically. If students are making assumptions that are not true, teachers could help them to understand better how they should approach, not just tests, but also all academic writing tasks. 44! ! APPENDICES 45! ! Appendix A: Essay Prompts TEST A English Language Test Timed Writing Exam Name ___________________________ Test Number ________________________ Write as much as you can, as well as you can, in an original, 35-minute composition on the topics below. Some people go to college directly after high school, while others take a job after high school and attend college a few years later. Which do you think is better, and why? Be sure to support your ideas with specific explanations and details. TEST B English Language Test Timed Writing Exam Name ___________________________ Test Number ________________________ Write as much as you can, as well as you can, in an original, 35-minute composition on the topics below. Some people get married directly after high school or in their early 20s, while others get married later, in their 30s, 40s, or even later. Which do you think is better (early marriage or late marriage), and why? Be sure to support your ideas with specific explanations and details. 46! ! Appendix B: Background Questionnaire BACKGROUND QUESTIONNAIRE ELT Essay Writing Project PLEASE FILL OUT THE FOLLOWING BACKGROUND INFORMATION. PLEASE PRINT CLEARLY. 1. Name: a. First name: ____________________________________________ c. Middle initial: _______ 2. b. Last name: ____________________________________________ Age: _____ 3. Gender: 4. Phone number: ( 5. Email address: _________________________________________ 7. Native language (first fluent language, also known as your “mother tongue”): __________________________ Male Female ) __________ - __________________ a. How did you learn English? ________________________________________________________________________ ________________________________________________________________________ b. How old were you when you started learning English? ___________________ 8. How long have you studied at the ELC? ____________________ 9. Did you attend University in your home country? _____________________ 10. What is your current ELC level? _____________________ 47! ! Appendix C Exit Questionnaire Exit Questionnaire: Please answer the following questions to the best of your ability based on your test-taking experience. 1. How much did you like typing your essay on a computer? 1 2 I didn’t like it 3 4 5 6 7 8 I liked it very much Please Explain. 2. How much did you like handwriting your essay on paper? 1 2 I didn’t like it 3 4 5 6 7 8 I liked it very much Please Explain. 3. 2. How much did you like answering the question about marriage? 1 2 I didn’t like it 3 4 5 6 7 8 I liked it very much Please Explain. 4. How much did you like answering the question about College? 1 2 I didn’t like it 3 4 5 6 7 8 I liked it very much Please Explain. 48! ! 5. How familiar are you with computers? 1 2 Not familiar at all 3 4 5 6 7 8 Very Familiar 6. Which essay do you feel you did a better job on? a. The one I handwrote on paper b. The one I typed on a computer Please Explain 7. Which essay do you think will be scored higher? a. The one I handwrote on paper b. The one I typed on a computer Please Explain. 8. Do you feel that you were placed into the correct level at the ELC when you first came? Yes No Please Explai 49! ! Appendix D: Table 10: Rubric Content Organization Clear Competence for Academic Study • Main ideas and 30 15 • Succinct, logical support are clear, sequencing to to precise, and relevant 27 • Thorough 13 • Clear development of thesis differentiation • Addresses the prompt between main Vocabulary Language Use • Sophisticated range 20 • Precise word/idiom to choice and usage, 17 word form mastery • Appropriate register • 30 to 27 • Mastery of simple and complex constructions Virtually no global errors • Few minor grammatical errors • Meaning is clear and precise Mechanics 5 • Demonstrates mastery of conventions • Few errors of spelling, punctuation, capitalization, or paragraphing • Meaning is clear ideas and support • Excellent internal cohesion through sophisticated cohesive devices Developing to Sufficient Competence for Academic Study (shaded area and above meet and exceed MSU minimum requirements) • Main ideas and • Good range of • Strong and consistent • Demonstrates 26 16 26 4 support are generally 12 • Generally clear higher level control of simple strong control of organizational to to to to clear and relevant vocabulary constructions conventions structure 25 • Sufficient 11 15 • Generally effective 25 • Generally effective • Occasional errors development of thesis word / idiom choice control of complex of spelling, • Main ideas stand and usage, despite constructions punctuation, • Addresses the prompt out the occasional error capitalization, or • Few global errors • Somewhat in word choice and paragraphing • Occasional local limited or word form • Meaning is clear errors superficial • Meaning is generally • Meaning is generally internal clear and requires no clear and requires no cohesion; reader compensation reader compensation 50! ! Table 10 (cont’d) possibly • Main ideas are repetitious or generally clear awkward use of • Support ideas are cohesive devices, mostly clear and over-reliance on relevant simplistic • Generally adequate transitions; development of somewhat thesis, but support choppy may be somewhat limited, superficial, or repetitive at times • Addresses the prompt Suggests Insufficient Competence for Academic Study • Main ideas generally • Somewhat 21 10 clear unclear to • Supporting ideas may to organizational 19 8 structure be somewhat obscured • Ideas seem disconnected • Development is generally limited, • Very limited or superficial, or ineffective use repetitive of cohesive devices • Related to the prompt, but may be slightly • Lacks logical off-topic sequencing • Limited sample; 24 to 22 Adequate range of higher level vocabulary Occasional errors of word/idiom form/choice • Meaning is generally not obscured or may require only slight reader compensation • 14 to 13 • 12 to 10 51! • Strong control of 24 simple constructions to • Inconsistent control 22 of complex constructions • Global and local errors not infrequent • Meaning is generally not obscured or may require only slight reader compensation • Limited range: (i.e., • Inconsistent control • Demonstrates 21 3 repetition of a small of simple inconsistent to number of constructions control of 19 • Lack of control or commonly used conventions words, rare use of void of a variety of • Frequent or words from the complex distracting errors AWL) constructions of spelling, • Frequent or punctuation, • Frequent global & distracting errors of capitalization, or local errors word/idiom paragraphing • Meaning may be form/choice • Meaning may be somewhat obscured confused or • Meaning confused or but not unintelligible, obscured and obscured requires some reader requires significant compensation ! Table 10 (cont’d) 18 to 17 • Main ideas and/or supporting ideas somewhat obscured • Development is very limited, superficial, or repetitive • Relationship to the prompt may be vague but discernable. does not demonstrate significant organizational features Clear Lack of Competence for Academic Study • Main ideas and/or • organizational 16 7 supporting ideas structure very to to generally unclear and/or 13 6 obscured/confusing confusing AND/OR • Minimal development of thesis • not enough to evaluate • May be off-topic AND/OR • Not enough to evaluate reader compensation 9 to 7 52! • Very limited range; repetition of a small number of words • Frequent errors of word / idiom / form /choice • Meaning may be unintelligible AND/OR • not enough to evaluate • Weak control of 18 simple constructions to • Generally ineffective 17 complex constructions or repetition of only a few formulaic complex constructions • Frequent global and local errors • Meaning is often obscured; requires significant reader compensation • No control over basic • Demonstrates lack 16 sentence construction 2 of control of to • Dominated by global conventions 13 and local errors • Dominated by errors of spelling, • Meaning is often punctuation, unintelligible capitalization, AND/OR &/or paragraphing • not enough to • Meaning is evaluate confused or obscured AND/OR • not enough to evaluate ! REFERENCES 53! ! REFERENCES Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press. Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice, revised edition (2nd ed.). Oxford: Oxford University Press. Benesch, S. (1987). Word processing in English as a second language: A case study of three non-native college students. (Available ERIC: ED281383.) Bridwell-Bowles, L., Johnson, P., & Brehe, S. (1987). Composing and computers: Case studies of experienced writers. In: A. Matsuhashi (Ed.), Writing in real time: Modeling composing processes (pp. 81–107). Norwood, NJ: Ablex. Bridwell, L.S., Sirc, G., & Brooke, R. (1985). Revising and computing: Case studies of student writers. In S. Freedman (Ed.), The acquisition of written language: Revision and response. Norwood, NJ: Ablex. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York:Academic Press. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. Collier, R. (1983). The word processor and revision strategies. College Composition and Communication, 34, 149–155.Haas, C. (1989). How the writing medium shapes the writing process: Effects of word processing on planning. (Available ERIC: EJ388 596.) Daiute, C. (1985). Writing & computers. Addison-Wesley. Douglas, D. (2000). Assessing languages for specific purposes. Cambridge: Cambridge University Press. Gebril, A., (2009) Score generalizability of academic writing tasks: Does one test method fit it all? Language Testing. 26, 507-531 Gerbril, A., & Plankans, L. (2009) Investigating source use, discourse features, and process in integrated writing tests. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 7, 47-84. Hamp-Lyons, L., & Kroll, B. (1996). Issues in ESL writing assessment: An overview. College ESL , 6 (1), 52–72. Harrington, S. (2000). The influence of word processing on English placement test. Computers and Compositions, 17, 197–210. 54! ! Hays J.R. (1996). A new framework for understanding cognition and affect in writing, in C.M. Levy et S. Ransdell (Edit.), The science of writing, Mahwah (NJ), Erlbaum, 1-27 King, F. J., Rohani, F., Sanfilippo, C., White, N. (2008). Effects of handwritten versus computer-written modes of communication on the quality of student essays. CALA Report, 208. Available at http://www.cala.fsu.edu/files/writing_modes.pdf Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275-304. Lam, F. S., Pennington, M. C. (1995). The computer vs. the pen: A comparative study of word processing in a Hone Kong secondary classroom. Computer-Assisted Language Learning, 7, 75–92. Lee, H. K. 2004. A comparative study of ESL writers' performance in a paper-based and a computer-delivered writing test. Assessing Writing, 9: 4–26. Lewkowicz, J. A. (2000). Authenticity in language testing: Some outstanding questions. Language Testing, 17, 43–64. Li, J., (2005). The Mediation of Technology in ESL writings and its implications for writing assessment. Assessing Writing, 11, 5–21. Plankans, L. (2010) Independent vs. integrated writing tasks: A comparison of task representation. TESOL Quarterly, 44, 185-194 Powers, D. E., Fowles, M. E., Farnum, M., & Ramsey, P. (1994). Will they think less of my handwritten essay if others word process theirs? Effects on essay scores of intermingling handwritten and word-processed essays. Journal of Educational Measurement, 31 (3), 220-233. Russell, M. & Tao, W. (2004). The influence of computer-print on rater scores. Practical Assessment, Research and Evaluation, 9 (10), 1-14. Schwartz, H., Fitzpatrik, C., & Huot, B. (1994). The computer medium in writing for discovery. Computers and Composition, 11, 137–149. Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing , 22 (1), 1–30. Shaw, S. (2005). Evaluating the impact of word processed text on writing quality and rater behaviour. Research Notes, 22, 13-19. Susser, B. (1994). Process approaches in ESL/EFL writing instruction. Journal of Second 55! ! Language Writing, 3, 31–47. Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press. Weir, C. (2005). Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan. Whithaus, C., Harrison, S.B., Midyette, J. (2008). Keyboarding compared with handwriting on a high-stakes writing assessment: Student choice of composing medium, raters' perceptions, and text quality. Assessing Writing, 13, 4-25. Wolfe, E. W., Bolton, S., Feltovich, B., & Niday, D. M., (1996). The influence of student experience with word processors on the quality of essays written for a direct writing. Assessing Wolfe, E. W., & Manalo, J. R. (2004). Composition medium comparability in a direct writing assessment of non-native English speakers. Language Learning & Technology, 8(1), 53–65. 56!