THESI': ZOCI This is to certify that the dissertation entitled AN INTERFACE FOR HUMAN-COMPUTER INTERACTION BASED ON FACE FEATURE TRACKING IN 2D presented by VERA BAKIC has been accepted towards fulfillment of the requirements for PhD degree in Computer Science & Engineering Major professor Date 23/91/277 2000 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BERECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 03,910 1’1 5 23033 ix lNTERHC BASED 0 AN INTERFACE FOR HUMAN-COMPUTER INTERACTION BASED ON FACE FEATURE TRACKING IN 2D By Vera B akic’ A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science and Engineering 2000 it ixtzirirr Ix. . ‘ ‘ 'r|_ M . . ‘ L‘ \\»'4‘ Ni ‘.4 . _' . A ' o . i‘ . ' V i“ .A ._ ‘ ‘ ".. o. . It I\ A v “ ' .... 4|. ‘ . ‘l “ I h ‘ A a. . l . ' ‘ '0 ‘ “N“NI‘G b . ' ‘ A”. ‘ 'u. 0‘ ‘ . ' ‘ ' r O'Vq. ‘ ' ‘I “h, .\ l:'\, . _ L ". _ ‘ ‘ ‘r.> ‘ \"s ,’~- ’ .' ‘I‘ J l‘ . ' A t‘ .\ V7. ‘ \ . . " . i v " 4. '. ‘ ‘4 u“ .‘. . .‘ '. ._ m. ‘ ‘. 4'4} ' «“V. d 1‘, ' .“I. . ‘ ' . . ‘~I ‘ i i ‘T ‘ v~, .. “ L i. _ . ~ I I‘M “l.l 0 '- t . . I . C ll .. § . \ ‘. \,V’|> ‘v F .4" II,» \ ~‘fln Q‘. A ABSTRACT AN INTERFACE FOR HUMAN-COMPUTER INTERACTION BASED ON FACE FEATURE TRACKING IN 2D By Vera Bakic’ In this dissertation, a new approach to control a computer using head movements, gaze direction and face expressions is discussed. A workstation user is observed in a non-intrusive fashion. Algorithms to find a user’s face and where the user is looking at the workstation display are described. The user can issue a command to the computer by making a certain facial expression. The following modules are described: (2') face detection that is based on a skin-color model and motion, (it) face feature location that is based on the knowledge of the face geometry, (III) gaze location detection that iS based on an Artificial Neural Network mapping of face feature location to display location and a mapping of mouse movement to display movement, and (to) selection that is done through face expressions. In a computer implementation, real—time rates are achieved, and between 10 and 30 frames per second can be processed. Essentially, a handless Human-Computer Interaction (HCI) is provided that can have various applications. Disabled people who cannot use their hands can greatly ' II ,. '0 v. y 1 V .lili ..‘ ‘A ,. u. “"' ‘T' f"‘\.’, 4 : Lu. u. \nll in. . I ’r 1-: .. ‘._’.” In .u. ll . ”I “l .. v I v x" " P \o “’ t" in . l. A ‘ z W " "'4 I'I 0. "’ I ‘ b.- _ I “fly i‘ 1“ u, 0 v‘. . 9| ‘ : l . '. Y . ‘K .IA. i» ‘1‘ A; . H '. " ' v e, 1 ‘1‘ _' . ‘2'- AI. .I i. ‘Au...’ """-wf ‘ ' "I a 4. ‘V ‘ yll.'.:‘.. \“ ~-..A __ f“ 1. I." ‘ h K “ ‘.u-' \"era Bakic benefit from such an HCI. Other users can use the interface as a replacement for a conventional mouse. Finally, the interface can be adapted to monitor where the user is looking in time, and the results could be used for various analysis, e.g., in cognitive science for visual perception analysis, in marketing for display exploration analysis, etc. The results Of a study that evaluated the usability Of the head-eye input. interface are presented in the final Chapter. These results show that the users were able to adjust to the new interface and perform a range of tasks common for current application programs. Among other tasks, the users successfully controlled Netscape: they browsed several WWW pages with an average of 5 erroneous selections and 8:35 minutes of time. © Copyright 2000 by Vera Bakié All Rights Reserved TO My Family M . ' . r _' X ‘ ‘ ~ ~\\ ‘, . . “ ..‘ ‘ ‘ 7 A ,_ .- I‘D l '3‘ ‘. . . ”n ' . ‘ .‘.‘ a. J . _ In“ V b. 7 ,_ ' . ‘ ‘r P‘II.‘ ‘ \I ~ «s.- “‘>.‘ ‘ v y ‘ A. l. .‘ ' r.‘ ~ n, Y ‘ .1. . A! ’ ‘ .. .uq“ u. ‘1 R~.‘r. 4 v ‘ . ‘ A . ‘ . \«O- 'A‘.‘ -“‘ N. l . “ - . a H. 7 ~' l' \7‘ 1. v-nh .1“ HA . -. ..‘ l ._ h, . . . ‘I-v ' s41:."\ _-H\ J \ t“ h F l I b |‘ . r“- F: s... :‘ ‘b—I ‘V..' A A4 ‘ ‘.. . . u l‘ 'l V H s., ‘|' ,‘ :j ‘ ‘14! . A L-‘. ‘3' . I." .' T l I ‘I . \‘k‘ “J; :T n ‘. u s .. .. r; a u . ' Iv -l~.|> r. _‘ \ "‘l l ._ .‘ I‘- a l' , ‘l . I .‘AAI ‘l . \ ~ ‘ . ACKNOWLEDGMENTS This thesis would not be possible without the help and support of many people: Dr. George Stockman offered excellent guidance, advice and support throughout my PhD research. I am thankful to him for the inspiration to pursue the degree in the computer vision field. Dr. Frank Biocca from Telecommunication Department served in my guidance committee and gave many valuable advices for my work. Dr. Anil Jain, Dr. John Weng and Dr. Sridhar Mahadevan served in my guidance committee and made many valuable suggestions and comments. Dr. Charles Owen helped with his comments. Dr. John Henderson from the Psychology Department helped with his expertise in the eye-tracking research. Dr. Richard Enbody was my advisor during the Masters degree program and I thank him for guidance and support. I would like to thank Dr. Alvin Smucker from the Plant and Soil Science Department and Dr. Theresa Bernardo from the School Of Veterinary Medicine for their support during various research projects. vi I ‘l ,I- 9 ' l. I m‘ t. I i,.‘st.»- 5“". .ul 5 I ‘V' '_'. v V "' l ‘ 4 ‘ .‘ l4 Anat— :;-.-~ “" . o l . , .0 .a l "A "' Viv m". ..: rm - .,__-,.,.a. . ,, . - ' l “11,". o O. l . ‘I ‘ u “t. “It. \- ‘- _o I . V ‘ , "J HY I‘lr ,0 '__ _ ‘3 . .I It - -lru~ : u u . - o . .. p _' T} ‘J ' . " .u“ “0| a . on at Ac 7. V". V ‘1v ‘ . u, . . 'V 4 - .4. \ .g. .uJA .h - ‘ l . f 1. ' "“ ' V ' v" .. s- A...” .I - _,.. " 'r ‘r I - ._ ; l v - .1, w .‘ . .‘ . \l 1Q. “i, H‘ ‘ I I) I - _ ' v. ,‘ ‘ ~_ "‘1 ' “ ‘r 9...; I . - IQ . v H I l . . . ... _ ‘WA..'. A fit , A. 9. ”..'.' . . .' ", —.. -I‘ . ‘ ,v‘ a ”‘ SKA. . I. q. \ v 4‘ ‘. I would like to thank all the other professors at the CSE department for excelent teaching and providing research facilities. I would like to thank all the CSE staff, Linda Moore in particular, for all their help. I would like to thank CSE undergraduate students Kristina Johnson and Brian Langner for their help during CSE Open House demos; fellow students from the PRIP lab for all their help: Aditya Vailaya, Salil Prabhaka, Vt’ey Shuan Huang, Arun Ross, Paul Albee, Scott Connell, Nicolae Duta, Friederike Griess, Karissa I\~’Iiller, Dan Gutchess, Erin McGarrity, and many others. Jay Newell and Jerry Roll from the Telecommunication Department helped with the “eyes-on-screen” experiments. Special thanks to all the people who participated in the experiments. Without their help and cooperation this work would not be possible. Finally, I would like to thank my family for their continuing support and help, my mother in particular for helping out in taking care of my daughter while I was finishing the dissertation. vii Jl OF TABLES LEI OF MUN I Introduction .1 ‘ ' ‘ 7v v, 04 0 I ~ ' \- It nahkot a... ~- I: ,. " rl ‘ .. . v 9 I r ,p" \ . ,4 so A. A. Ill ». A - .u. TV [HI-n r i v. 0.. u...uuu- ' ... .. s”. . I to». \.';..A I". ‘5‘ \l '. .utt 4 u.. l ‘1 "I . r‘v 1' v? v ‘ .-. . .., "5 CH". ' A l ‘9 1 . ., 'r ' .7 7". . ' .~-La...&u . 'h ' 3' Background 3 7" 1‘.- Y _ '. a. ., f“ Inv.. ..J ‘,_v. " 'L.‘ f . "AA _\ 5] r ‘ Q ~ .5. P]... I ‘a I‘ ~.. ~. ;‘ L”... t; r‘. ‘ . ,, I k!” Q A. p ‘ .. "aiu,’ ‘.l‘ '~ " . .' 5.. Qt“. ,‘ , ‘ \ .‘ . 3‘ r H t I1 I I u .. ~ If, ’ r'r .' 3' l“‘ i .‘l ‘v ' .b- 4 ft "(L -l . . 4 I 7 h v. ‘ -., .I g. y I \"3 ." Ix AG. . L". L. TABLE OF CONTENTS LIST OF TABLES xi LIST OF FIGURES xiii 1 Introduction 1 1.1 Applications of Computer Vision ...................... 2 1.2 Problem Statement and Motivation ..................... 2 1.2.1 Human-Computer Interaction ....................... 3 1.2.2 Cognitive Aspects ............................. 5 1.2.3 Computing Challenges ........................... 7 1.3 Proposed Gaze Tracking System Organization ............... 10 1.4 Organization of the Dissertation ....................... 10 2 Background and Related Work 14 2.1 Face Localization ............................... 14 2.1.1 Clustering, PCA and Rule Based Approaches .............. 15 2.1.2 Artificial Neural Network Based Approach ................ 16 2.1.3 Skin Color Based Approach ........................ 18 2.2 Facial Feature Finding ............................ 19 2.3 Facial Expression Discrimination ...................... 22 2.4 Gaze Direction Detection ........................... 24 2.4.1 Mathematical Solution ........................... 24 2.4.2 Pupil Reflections .............................. 25 2.4.3 Limbus Tracking .............................. 27 2.4.4 Artificial Neural Network Estimation ................... 27 2.4.5 Electro-Oculography ............................ 27 2.5 Kalman Filter ................................. 28 2-6 HCI-Related Issues .............................. 30 2.6.1 Basic 2D Input Devices .......................... 30 26.2 Interaction Tasks .............................. 35 26.3 Mental Models in Human-Computer Interaction ............. 36 2.6.4 Fitts’ Law .................................. 38 2.6.5 Evaluation of Eye-Tracker as Input Device ................ 40 2.6.6 Eye-Controlled Systems .......................... 41 2.6.7 Problems with Eye-Tracking Systems ................... 44 26-8 Head-Controlled Systems ......................... 46 2-7 Visual Perception Studies .......................... 47 viii i “‘ 7,, {FVI\\ ' I 131*? v ..,~ 9 : , a ‘ ' . r I fistt-M-ill'fli. 3 late and facial l‘ ' P' "'1 V 'fl‘t“ " - u .. e‘ u ( ‘ “I ' . \L‘; l ‘ll'lb J. .‘ 31.. v . . I . . n I‘ :14 I“ I“ I 'l v '- ' ‘ 1‘ .I n o l n. . ‘ V ." nt'l.t"l'. '.IL . I Alan} 6 ~\ Ill AA \ I I II‘ F. !~ 0 o . , kt‘i '. L‘.\ .'a .u- I“ 9 7‘ “‘ ' l, v - L- " . .u ‘ J. Jab oa...g .- n '~ ‘I l 'v q u 5 ‘ D ‘ y I I . " ‘ “ ‘ a s In at ‘ \""‘n 1"- ~ 1...... \L. V i lacking the Fe; '- . a ' ' V ’ \I o y. n l , ‘- Lulu.“ _. .\ I -.. . ,. - ».,.. ll . r 1. .‘ - , I ( . ..u.~ , . u u ‘ F “I . :‘ ' ' ‘ It. I . 4. ‘ -. . \ \ ‘ . ».4 \ - .m. .k: .‘.I n P ' .J H 1‘ I.' .T h I ‘ ' .A,‘9 .6. .. Aao ' I ' - . ' - .. . ‘1 O‘ i . . " ~L3.I‘;_‘ .I‘ ‘ .- h‘ H' "d d " ‘kAAN A‘ I Gaze Direction "o..“-H . a“ ‘ '9. ' Hm.“ “-1.12“ It ., . 4 A III IV”. ' " um. _. N l'i rv .5. i . ~J ‘lr'r ‘vk .. ' vw. ~l“ 1'“ ' N .. \w- .M' ‘ :, . V A.._m~ L p ., _ 'n - I ~“. ‘ . ‘Iw ' ‘ nus _ .,' g ‘ L'“ a. "1 in .' r ‘ ‘n‘. \ I, L. ... 1., .. f. ,_v-. ’v. \I ‘u -I_' ._‘ '-.- f"\ , \l ‘ \T‘ I I ‘p '. ‘r J. \"n,.' :‘h .. T‘“--:.\ :I.‘ :23 co "8? a o :1 I. “ . ‘r-a-o , \r . ~ p I . ' . ‘ Q. \ ~ ‘ at I ' 5‘ a. \Lgo. v ' V . ‘ 'I_"‘ I. ‘ . . N ' I ‘5 ELM... ' I . ' ‘I ‘-.-A -I. ,1. {_ . . i a u v . 4', I’D? “ ‘. .._‘ "(1“ ‘l 0' _ . "F 'v v .. . "-1 Ll‘fi :v T . .. 1“ . . I, v C . D‘,' t" 4 .I \u, 4 'L. “[-.4‘ . I ..‘ ‘..__ -.-.4"f.-,‘ A“ o;‘\ 'l u -, .5 »~. \ ‘4 2.7.1 Types of Eye Movement .......................... 48 2.7.2 State—of-the—art Eye Movement Tracking Technology .......... 50 3 Face and Facial Feature Detection 51 3.1 Face Detection ................................ 52 3.1.1 Skin Color Model in RGB Color Space and F ace Localization ..... 52 3.1.2 Skin Color Model in HSI Color Space ................... 55 3.1.3 Problems with Skin Color Model ..................... 56 3.1.4 Problems with Dark Skin Color ...................... 58 3.2 Improvement of Dark Skin Color Detection ................. 59 3.3 Eyes, Eyebrows and Nose Location ..................... 62 3.3.1 Detecting Eyes ............................... 62 3.3.2 Matching Eyes with Eyebrows and Nose ................. 65 3.4 Accuracy of Feature Detection ........................ 69 3.5 Summary ................................... 72 4 Tracking the Features 74 4.1 Tracking System State Diagram ....................... 75 4.2 Using Motion Detection to Improve Tracking Results ........... 78 4.3 Results Using Kalman Filter ......................... 81 4.3.1 How far from the camera can the user be? ................ 85 4.4 Analysis of Movement Data ......................... 86 4.5 Summary ................................... 88 5 Gaze Direction Detection 89 5.1 Introduction and Terminology ........................ 89 5.2 Gaze Direction Determination Using ANN ................. 91 5.3 Gaze Direction Determination Using Movement Vector Scaling ...... 94 5.4 Smoothing of Gaze Point ........................... 97 5.5 Results ..................................... 98 5.5.1 Fixations Determination .......................... 98 5.5.2 Attention Measurement .......................... 103 5.5.3 Cursor Movement .............................. 112 5.6 Summary ................................... 115 6 HCI Based on the Head-Eye Input 116 6.1 Selection Mechanism ............................. 117 6.1.1 Selection by Head Motion ......................... 118 6.1.2 Selection by Facial Expression ....................... 121 6.1.3 Other Ways to Make a Selection ..................... 123 6.2 GUI Design Issues .............................. 124 6.2.1 Level of Feedback .............................. 124 6.2.2 Button Sizes ................................ 126 6'3 Advantages and Disadvantages of the Head-Eye Input Interface ..... 130 6-4 Summary ................................... 133 ix T imitation of ill! Th G I..~ Hi '13 O.‘ ‘\ .. . \-"ql P4!“ .I; . . -i El'ai'titxnl. l‘: . . '4 3: ~. . 3:3 3m“ ‘33 ‘Iii 3i: TI 32' ii '.'_ 3 '3. G' I_‘ T ' " v m In. "I: I’. Q 0.: i uln A I u ‘ .8, ‘ ‘ if 2M” f. . .I “.1qu F ‘ ' L ft'l‘e ii r. IPPENDICES flanks of the Biask 11 C om; flask 12 Com] [111313 Com] l lair 15 C 0m; 7 Evaluation of the Head-Eye Input Interface 7.1 The Goals of the Study .......................... 7.2 Subject Pool ............................... 7.3 Evaluation Procedure ........................... 7.4 Results ................................... 7.4.1 Task T1: Moving the Cursor Along a Path on the Display ..... 7.4.2 Task T2: Guided Button Selection .................. 7.4.3 Task T3: Guided Dragging of Icons .................. 7.4.4 Task T5: Application ControlmNet-surfing and E—mail Writing 7.4.5 Summary of Questionnaires ...................... 7.5 Discussion of T1, T2, T3, T5 Results .................. 7.5.1 Learning curve ............................. 7.6 Summary ................................. 8 Conclusion and Future Work 8.1 Research Contributions .......................... 8.2 Future Work ............................... APPENDICES A Results of the Gaze Monitoring Experiment B Task T1 Complete Results C Task T2 Complete Results D Task T3 Complete Results E Task T5 Complete Results BIBLIOGRAPHY 134 135 136 137 143 144 147 156 164 172 174 177 181 183 184 186 188 189 200 221 243 266 269 . .. [7... . V . I I ‘ hu.-I"I II“ . \ :."I ..1 ..,. i H... ,_, . .1 l I ‘l p 4» um I ”_ ' 1 I I t“ ‘D I' hU'I-l I“. A A. ' i; .. 9", . ..~ " {N l.\‘... l - . 1' .6 ,. I ..’ ‘ AA 0‘. .J tal‘ 8... ‘- r e, w ‘ ' . ,. . “u.“H ... I "\5« I '-‘ _ \ P ' .‘_ .4. 'J- ' . D . '“"~V":. I . .o u 34“ w. \‘A ‘ H e.‘ c "- \’ . . . |.,'V.. I ' . ' A ‘Ia...’.‘j. I." “ .‘ i""‘< l' Mu-“ IKA. -. ' !., ~ y. a"... h“ A A .‘A ‘4‘ ‘ .. l I n 3" ' a‘. ‘s ' 1 Us“) . . “I, . ‘ ’- ‘L‘ 3.1 i. J‘II'VG.‘ y u '3’“.- ‘9‘{¢:0._' M'A ‘ . N\ ‘ 4., I. ‘1; L-"/‘.. Q‘ ‘i." ‘ r“... _‘ ' La u o o “. ‘v 6| I ‘ c 5 J G ‘ l I . s V - . .«L 1') ’V ’ d. ?--‘,' “.3: f- ‘. ‘ 3"“ l r -. . h ~ ..‘ l r _.1 l'a . .7 2.1 3.1 4.1 4.2 4.3 5.1 5.2 6.1 7.1 7.2 7.3 7.4 7.5 7.6 LIST OF TABLES Resolution and size characteristics of common display devices ...... Accuracy of the feature location. Percentage of found matches and average Euclidean distance of program-found features from hand—labeled loca- tion. Comparative results for normal environment vs.controlled (plain green) background are given ........................ Accuracy of the motion detection and dark face detection given as the Euclidean distance between feature points and hand-labeled locations. Euclidean distance is given in pixel units, and the number of matches is in frame units. ............................. Accuracy of the tracked feature location .................. Execution times for six runs (two subjects). The program was run for 1000 frames of video, on an SGI Indy 2 workstation with the input image size 320 x 240 ................................ Percentage of agreement in viewing positions and Kappa statistics for all subjects .................................. Number of observations, their mean and standard deviation in seconds for manual coding, and automatic coding .................. Features and requirements for the head-eye input, eye-tracking based in- put, mouse, trackball and touch-screen ................. Subject pool: details of the variations in facial complexities ........ Number of erroneous selections at the previous target location: “y” entries mean that there were more than three selections at the previous target. location, “11” entries mean that there were less than or equal to three selections at the previous target ...................... Percentage of correct selections and the number of erroneous selections per each correct target selection: data from the 8 x 8 grid is collapsed into a 4 x 4 grid. ............................. Task T5: Assigned weights for each sub-task ................ Task T5: average number of errors, elapsed time and percentage of the task completed .............................. Distribution of errors across the sub-tasks ................. xi 71 110 111 131 137 155 155 168 . .4 .- l ,. I‘ \ufiir! Aliii 43131-- .4iq 7i? I "I “1"“3‘IIH f1 7‘ ' T . . .- . ‘ .Y‘ 1‘ I ._ ~ I _ ,‘ . \w‘ ”-7 ~ I. I. y .. .J'Ti'l IM’J I31 .‘. I‘, .1 mi-‘ .. I" " 33.? I1; 33’“? \\ \3\ " ' I . r v ‘- -~ "h , ' -1 .551 I}. 11““ 'I | O A1 8.1 C.1 C.2 D.1 D.2 E1 E2 Number and durations of fixations measured by the gaze-determination algorithm ................................. 190 Task T1: average squared error for each subject .............. 201 Task T2: button selection accuracy statistics ................ 222 Task T2: button selection timing statistics ................. 223 Task T3: button selection accuracy statistics ................ 245 Task T3: button selection timing statistics ................. 246 Sizes of typical buttons in the Netscape browser and mail windows, and a typical news WWW page ......................... 267 Task T5: number of errors and timing statistics .............. 268 xii u. n";“ .. L ‘ p ,w-~u.h 'ju" 1 43h.“ k J, .n J . I ‘ ' ‘ v r r \ J13. a. . ‘ Q P.Qp " 3"..2 gnu-y It! _. 0 a N i 7 'V '1 - . .I‘ r r g ‘ a... nu ”Us- | .n I‘ II . . .' , ' \ 5 ‘I In; .‘3. .. " " Y" . ’._ I .h' .I‘ . ‘h ‘ -“ l u...” I | ’.' U '. .,' ~.. I. . . ‘h ‘ 0.. _" . b ... '4 u ,\ 'g‘ ' .. 3‘ ‘u and. y. . l r " m" Q- . . “dd...“ A ' .O‘ I Q l .. l , \ f. 1‘; ‘V . Q lv 9. ‘ u.‘ \v, . ‘ .,r '. , “uh“ ‘v. 1.1 1.2 2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 LIST or FIGURES Purkinje pupil tracker, an illustration (Courtesy of John M. Hen- derson, Michigan State University Eyetracking Laboratory, http://eyelab.msu.edu/) ........................ Gaze tracking system block diagram .................... The illustration of the system described in Ohmura et al. [58] ...... The four Purkinje images are formed as reflections of incoming light rays on the boundaries of the lens and cornea ................ Fitts’ law: analysis of the movement of user’s hand (from [16], page 52) . Example of eye movement during face learning and recognition. (Courtesy of John M. Henderson, Michigan State University Eyetracking Labo- ratory, http : //eye1ab . msu . edu/) ................... Skin color clusters: the horizontal axis is Rnom and the vertical axis is Gum". Out of six clusters, three represent Skin color. Cluster la- beled Face is the primary skin color, and clusters labeled Shadow_1 and Shadow_2 are areas of shaved beard and shadows around eyes. . . Isolated face regions on sample images ................... Skin color clusters in HSI color space: the horizontal axis is hue and the vertical axis is saturation. Skin color is represented with three clusters. Cluster labeled Face is the primary Skin color, and clusters labeled Shadow_1 and Shadow-2 are the areas of shaved beard and shadows around eyes ................................. Comparison of RGB and HSI models for skin color classification ..... Erroneous face regions due to similar background and face colors. Localization of dark skin color subjects: two failed and one successful] classification ................................ Red, green and blue intensity histograms for dark and light skin tones . . Color-altered dark images, and resulting skin-color classification ..... Red, green and blue components of two images: gradual thresholding in search for eye blobs ............................ 3-10 Gradual thresholding in search for eye pupils ................ 3-11 Matching eye blobs with nose ........................ 3-12 Results of eyes-nose detection for various roll, pan and tilt angles. The white rectangle shows face boundaries, and the white dots show the locations of the eyes, eyebrows and tip of the nose. .......... 3-13 Sample images used for accuracy of feature detection calculation ..... xiii 49 53 54 56 57 58 59 61 62 64 65 66 70 x ‘ \ ’. '4 " \sl'". ' 3'» u. i ”315..“- I-I ‘ ‘ ' I”. H mummfl ll. ... ‘I ‘Llut \|‘IOI. ‘u - , ‘ or . linl'ti Iii “' i .v'..\ . a l "".t""'.'. '3 "5-"';ir g ‘I “Ah-lb . n _ o f ' ‘ "I.'"v‘ "I ” 3'4“ u I ' VI I F ' ‘ | t" ‘I l‘ ' I‘ "y I; .I 33? I" " {I l-l. I o v . ' ' ‘ “LIILA ‘II |. A .I I a» .I ~ .I “(I .A-V It I- - 4 -'. ’ [~10 ‘! ‘ KI4LAD ‘\ ~~sAI u I '1 ’- - v o . .I \‘n I". II..IJ 3- . ~WAn‘yg ‘ . , ._| . , . I". \ | “I .‘ $‘fi- CI . . . .‘1 ' Dy. " . do“: A.‘.|..... It A - l "K V' - M-iMILvII.‘ u . . h I 'J \ “\IQ.‘ .. q ' O . .l . v ‘71\ . I. .l.‘ ‘. “Ig'q‘. .‘ . l I. H P. .. .‘4. o \ 'I.' .\ ‘5 tun-l . _ .. . .. r. ,l‘ ' .. 4.| ‘ m I.'.L ' 7“ u. 6‘ ‘ 4 . . . 4... ' ._ ‘ v47 T. I I 'Jut’.’ “(Ln ‘1“ . . f. 'l‘;.'.‘ £' ‘ rut“. \‘A(|‘ 1 Y‘ I",.’ v “W '.I c. ‘ . I“ 09 y... . . a Y "4 H .I h... J u‘ I.- fr 1. ' 99.. ‘: ‘ r- ‘ V . A ...,.s.u.h ”A ' O., A . ‘Q. I ‘ll .“ ’ . PI \ Y I vuL. ‘v "o-.‘ C. I t- . _' .b-s- | .. ‘k‘ V A. - i" n ‘ '. i'TH "V 1 ' . Jx. 3“,] ‘.|‘. 9' ‘ 3 \. l ‘ P .. ... .1... ,‘c “Q 4., 2‘ ~. I . _ -_ ‘ u. 5““ '1. Iv r)‘, 'i ."“ t,‘ . .' U...'| ‘_‘\‘II"1 a r P4 w . .. ~.H~,‘“"‘: 4r . unr‘ I). -". ‘1 7‘. '3'; ”2.3; ‘_ 444- A‘:“\. ;_ . \b— Hi I! q 4 d e" 4. ,1 . "v n ‘k“b.v ‘ v . u ‘ "I ._ ‘I . . “H. .4 . ' “a. L- " V M -.‘ : ., IA :“ " it, . " ‘er , . 4 4., a.“ \ c “ l‘ . . \Y‘. . I ' ,- T‘Z,‘ I. ‘ O ‘1 .4. r I >Ph‘"l 3' u‘ | “ 1.4 'I \' " .ifi‘ 6’1" ' y . . ,. I \' -, \‘ _, . _, _ . “53‘4”. :Q I .i 7. “L A - 'I. .' ‘. d.. 2 - . 43“. T: AI. ‘. ("V t i c I. 8..“ 4 L» . 54. a y\‘ 7‘ . k .l \‘ f\ “L ‘ I. I? 3‘ If 4.1 Tracking System State Diagram ....................... 4.2 Comparison of the X and Y coordinate tracking without (top) and with (bottom) the use of Kalman filter for smoothing and prediction. 4.3 Sample time-space statistics for two movie files: X and Y coordinates of feature points in time, and X vs.Y image coordinate .......... 5.1 Relations between the eyes and nose used as the input to the neural net- work to determine the gaze direction. Dotted lines show the relations we use as the ANN input, dashed lines show the relations that had high weight in the resulting ANN ..................... 5.2 Sample input data for ANN trained to determine gaze directionDashed lines show the target grid points, rectangles represent face boundaries and triangles represent the corresponding eyes and nose locations. 5.3 Comparison of gaze tracking results using the neural network only and using the scaling of movements in the picture to display movements. . 5.4 Comparison of gaze point determined by movement scaling, Kalman filter estimation, and weighted averaging ................... 5.5 F ixations measurement experiment setting and viewing angles ...... 5.6 Fixation determination algorithm ...................... 5.7 Gaze path and fixations for a sample image. White line indicates the gaze path. Gray dots are fixation locations, the dot size indicates fixation duration. Black line connects consecutive fixation locations. ..... 5.8 Attention measurement experiment setting and viewing angles ...... 5.9 Attention measurement calculation algorithm ............... 5.10 Gaze coordinate and viewing positions (TV vs. computer) over time for three subjects 800, 801, $04 ....................... 5.11 Gaze and eye coordinate changes magnitudes histograms for two subjects 5.12 Sample results of cursor control using gaze tracking ............ 6.1 Selection by head motion—state diagram for “yes” and “no” motion de- tection ................................... 6.2 Predefined order of viewing the cards .................... 6.3 Paths to selected locations and YES/NO response paths ......... 6.4 Sample mouth expression .......................... 6.5 Mouth expressions for a dark-skin person .................. 6-6 Mouth states transition diagram ....................... 6.7 Snapshot of the face tracking demo GUI. Red dot shows where on the screen the user is looking ......................... 6-8 Netscape browser and mail windows with selected buttons highlighted . . 6.9 Netscape button sizes relative to the display size .............. 7.1 Task T 1: average squared error ....................... 7-2 Task T1: cursor path for the best and worst squared error results in the first session ................................ 7-3 Task T2: button selection accuracy and selection times .......... xiv 76 82 87 92 93 99 100 102 103 104 106 107 109 114 119 120 120 122 122 123 125 128 129 145 147 I -'-- I" ‘ .l -.l I I; T6351... ‘t‘t H . .p _I .. ~_ 16:th 5.371 . Ix. {rm 3 £111 6 ‘l ., T _l_ T1) "‘ OysI r ‘ ' ‘ .. ll .L‘L‘h ‘.~ ‘ I . Ivfll' WWI ""‘l'. ', 1&5 1.). Eu... ' "Iv-fly? '1‘ I“ ‘U-al- I ‘ “ .. v v ‘V‘ ‘ . . 11:51.1 w . .- 7'“; 3 "H . 3 .21. -. ‘- . 3|: . lirld. - ."r (i ‘ 5711113. l>:~'.'.‘ o In - I Tq 1:) .' . 1 p - ~ A ,\ no A L l') n :1 I“ ‘ Y ' ' I '* ‘ ‘ "' .r Dal. 11.1. r.’ ; Ic' ' . u . . I '.'w~.‘ - ‘r... ‘ . DA , ~ u‘ ..u......:. .‘ ‘ 1. I . . . ‘ 7". «A Jar? mini l . ,A ._ n l . 1 ' '. '2‘ "H v I 17. Jar pa,“ .1.“ A l N n.1 1'," r. I 11 IJ u 7' J1. “ .‘h‘ ‘ H F‘ .. ,‘ . ‘¢ v .- "' . I t I ‘itr JG... 1.... . I '..— 1:. J l '.“r y 1 l 4 ~ Uun. dry; . l I ' ‘ v . O . . A _| 7, v 4 A- J l'.‘ dd L4 6.1 ‘ ‘ I ' 0 {7' ’ r . . A.» vh" ”la.“ dr.“ ‘ I ' : 7” . ., ...‘ ‘Jatr I, 7 y 1 fur. 6...”. l . ' v'fi'V‘rO'. _' V a.‘ iJ‘Ir‘ I}; “"3on du‘. * i . _ -b - m ll. .- :. .. ' m. ‘ .. .Ql. l; \iL' .. U . 1" ‘r . Il' .T', I '1‘ .‘\I II, “l“ 1“ .“J. ' P, ~ ‘ T " r... 1, 4’. :‘N yup , ““ ‘5. ‘4 ’I .. . ‘ ~ -- . .1. ‘7'. . 8- ‘ 1 MC}. 1‘ \‘A' , 1“ . I a T ' ‘ \ .. «at. 11. ‘J‘e‘.’ - . « o.‘ '96.’ T1 I . 1“.“ A :‘ir‘l ' " a a . 17.- . .M‘ I1 ‘ . A‘“ ‘10 . ‘ ‘,'.'l :I-~. ‘ J ,1, 9“,? . _ .‘L Li. 1‘» .. l"-> -' a l ”n. lq‘l' T'I "'1 AM i ‘ >'J./ ,. ‘ v .’ f“ ' "- "i ' ‘kd 1‘. \‘r P ‘- v : I "Id Q‘JT] I ‘Jl. ‘1. "no ._..I v 4- s".- v a *mavT‘ ‘51 ll \“7 v... .4 'VA . ‘ UU¢1"' TQ -..L . ‘~ 11.x“ .. ."5 D ‘1 ' .‘ A4"T «hi ' ‘4‘ l 4:; "‘ .‘"~ ‘ n . .c; T1 0.}, ‘1. " 1'41} ' .‘ ‘\ _ J I l. l T‘ ". I" . “A . ‘ A A“ \‘i‘ .1,‘ 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 A.1 A.2 A.3 A4 A5 A6 Task T2: selection times vs. button distance by Fitts’ law ........ 148 Task T2: selection times vs.distance by Fitts’ law, and cursor paths for trial 2 for a subject whose performance increased in the second session 151 Task T2: Distribution of the erroneous selections for each target button . 153 Task T3: number of correct selections by attempt and their timing, and number of dragging steps ......................... 157 Task T3: selection times vs. distance by Fitts’ law ............ 160 Task T3: selection times vs.distance by F itts’ law and cursor paths for trial 2 for a subject whose performance increased in the second session 162 Task T3: Distribution of the erroneous selections for each target locations pair .................................... 165 Task T3: Distribution of the correct selections for each target locations pair166 Spatial distribution of the erroneous selections for each target locations pair: darker lines represent more errors for that target locations pair. 167 Learning levels for 18 subjects ........................ 180 Gaze path and fixation points, subj-2000 and subj—2001, images 0, 1, 191 2 . Gaze path and fixation points, subj-2002 and subj-2003, images 0, 1, 2 . 192 Gaze path and fixation points, subj-2004 and subj-2005, images 0, 1, 2 . 193 Gaze path and fixation points, subj-2006 and subj—2007, images 0, 1, 2 . 194 Gaze path and fixation points, subj-2008 and subj-2009, images 0, 1, 2 . 195 Gaze path and fixation points, subj-2010 and subj-2011, images 0, 1, 2 . 196 A.7 Gaze path and fixation points, subj-2012 and subj-2013, images 0, 1, 2 . 197 A8 Gaze path and fixation points, subj-2015 and subj-2016, images 0, 1, 2 . 198 A9 Gaze path and fixation points, subj-2017 and subj-2018, rnages O, 1, 2 . . 199 B.1 Task T1, subject 00, cursor paths for all three curves ........... 202 8.2 Task T1, subject 01, cursor paths for all three curves ........... 203 3.3 Task T1, subject 02, cursor paths for all three curves ........... 204 BA Task T1, subject 03, cursor paths for all three curves ........... 205 B5 Task T1, subject 04, cursor paths for all three curves ........... 206 8.6 Task T1, subject 05, cursor paths for all three curves ........... 207 8.7 Task T1, subject 06, cursor paths for all three curves ........... 208 3.8 Task T1, subject 07, cursor paths for all three curves ........... 209 3.9 Task T1, subject 08, cursor paths for all three curves ........... 210 13.10 Task T1, subject 09, cursor paths for all three curves ........... 211 B.11 Task T1, subject 10, cursor paths for all three curves ........... 212 B.12 Task T1, subject 11, cursor paths for all three curves ........... 213 3.13 Task T1, subject 12, cursor paths for all three curves ........... 214 3.14 Task T1, subject 13, cursor paths for all three curves ........... 215 315 Task T1, subject 15, cursor paths for all three curves ........... 216 316 Task T1, subject 16, cursor paths for all three curves ........... 217 317 Task T1, subject 17, cursor paths for all three curves ........... 218 I318 Task T1, subject 18, cursor paths for all three curves ........... 219 319 Task T1, subject 20, cursor paths for all three curves ........... 220 XV I p. .l. a n1. £512.51: Hill. a: ‘1' ,a | ”0? 1'- -T“, su'.’ 7" a u; .- I A 7, It ‘ p .00 Uln'll .l'l .u 9'1 l' 1_‘~.. u -'- f’dl C.1 Task T2, subject 00, F itts’ law time-distance plots and cursor paths for each button ................................ C.2 Task T2, subject 01, Fitts’ law time-distance plots and cursor paths for each button ................................ C.3 Task T2, subject 02, Fitts’ law time-distance plots and cursor paths for each button ................................ C.4 Task T2, subject 03, Fitts’ law time-distance plots and cursor paths for each button ................................ C.5 Task T2, subject 04, Fitts’ law time-distance plots and cursor paths for each button ................................ C.6 Task T2, subject 05, Fitts’ law time-distance plots and cursor paths for each button ................................ C.7 Task T2, subject 06, Fitts’ law time-distance plots and cursor paths for each button ................................ C.8 Task T2, subject 07, Fitts’ law time—distance plots and cursor paths for each button ................................ C.9 Task T2, subject 08, Fitts’ law time-distance plots and cursor paths for each button ................................ C.10 Task T2, subject 09, Fitts’ law time-distance plots and cursor paths for each button ................................ C.11 Task T2, subject 10, Fitts’ law time—distance plots and cursor paths for each button ................................ C.12 Task T2, subject 11, Fitts’ law time-distance plots and cursor paths for each button ................................ C.13 Task T2, subject 12, Fitts’ law time-distance plots and cursor paths for each button ................................ C.14 Task T2, subject 13, Fitts’ law time-distance plots and cursor paths for each button ................................ C.15 Task T2, subject 15, Fitts’ law time-distance plots and cursor paths for each button ................................ C.16 Task T2, subject 16, Fitts’ law time-distance plots and cursor paths for each button ................................ C.17 Task T2, subject 17, Fitts’ law time-distance plots and cursor paths for each button . ................................ C.18 Task T2, subject 18, Fitts’ law time—distance plots and cursor paths for each button ................................ C.19 Task T2, subject 20, F itts’ law time-distance plots and cursor paths for each button ................................ D.1 Task T3, subject 00, Fitts’ law time-distance plots and cursor paths for each target ................................. D.2 Task T3, subject 01, Fitts’ law time-distance plots and cursor paths for each target ................................. D-3 Task T3, subject 02, Fitts’ law time—distance plots and cursor paths for each target ................................. xvi 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 247 248 249 41. .. \ | (1‘ t. i’ .»p . . .. ‘ 4‘ I t. ..\.I.J . ~L ... ‘u I!“ P .\ . u “23‘ A'J. - . V n u an .\. o . . ‘01:) i'. s I IA Jr) .V... V..-h «x *1. ‘ l <- D.4 Task T3, subject 03, Fitts’ law time-distance plots and cursor paths for each target ................................. D.5 Task T3, subject 04, Fitts’ law time-distance plots and cursor paths for each target ................................. D.6 Task T3, subject 05, Fitts’ law time-distance plots and cursor paths for each target ................................. D.7 Task T3, subject 06, Fitts’ law time-distance plots and cursor paths for each target ................................. D.8 Task T3, subject 07, Fitts’ law time-distance plots and cursor paths for each target ................................. D.9 Task T3, subject 08, Fitts’ law time-distance plots and cursor paths for each target ................................. D.10 Task T3, subject 09, Fitts’ law time-distance plots and cursor paths for each target ................................. D.11 Task T3, subject 10, Fitts’ law time-distance plots and cursor paths for each target ................................. D.12 Task T3, subject 11, Fitts’ law time-distance plots and cursor paths for each target ................................. D.13 Task T3, subject 12, Fitts’ law time-distance plots and cursor paths for each target ................................. D.14 Task T3, subject 13, Fitts’ law time-distance plots and cursor paths for each target ................................. D.15 Task T3, subject 15, Fitts’ law time-distance plots and cursor paths for each target ................................. D.16 Task T3, subject 16, Fitts’ law time-distance plots and cursor paths for each target ................................. D.17 Task T3, subject 17, Fitts’ law time-distance plots and cursor paths for each target ................................. D.18 Task T3, subject 18, Fitts’ law time—distance plots and cursor paths for each target ................................. D.19 Task T3, subject 20, Fitts’ law time-distance plots and cursor paths for each target ................................. xvii 250 251 252 258 259 260 261 262 263 264 265 Chapter 1 roduct u.“ l "'I.\ . '1 '1 ..‘.r _‘ F314,)“. “:1 “9“; AV" . -; 9‘ V Ill 1' ‘1 . «A no . k4.‘ ‘ I. l V "I «J "-.'-’I -_r..., v 'v ... A ,M‘AA t...” 'lu‘t '1‘ l K. 4,. I , n .9... ,. A. L‘AH‘WA‘ --._ . h- ~ ‘ ‘ _ ,‘vo, . 0} ~ ¢u . ’. . 7 Q ' t fr ' _ k. N»- y . ‘ u, _I 1 w... A .~ .y ' r. k)‘. F\ TI “'7“: h t 0‘ ‘_ ~ I} A V v._ ‘ ‘Aluk‘ o V'u.‘ ‘5‘," . '- . 4‘: ., ' a.” I? ' “.4“? I. ”I... "s Chapter 1 Introduction In this thesis, a computer vision based interface for Human-Computer Interaction (HCI) is presented. The interface is based on the tracking of a human face and facial features in 2D and offers handless, non-intrusive and inexpensive HCI in a. moderately Controlled environment. We present the computer vision algorithms used to locate and track a human face and facial features, we propose a gaze determination scheme Used to control the cursor and HCI based on the interface, and, finally, we present the results of an evaluation study of the usability and performance of the interface. In the proposed interface, the human face is tracked and based on the face motion the Cursor is moved. We do not determine the cursor position based on the absolute head position, but rather the user is driving the cursor position with head movements. AS With all novel interfaces, the use of our interface needs to be learned since most users do not think of their head and eyes as an input interface, and the users need to l . . ear“ to perform controlled head movements to achleve the desrred cursor movements. . App it; 1 in. s-—-'" '» L I. urn, U "ll!" 1, . » ~u'otlu “Ara “H "’t! l ‘ 4 .1 , '- \l v‘\ \\ w u». .luu A l l n " r". o , v ' I‘ I ‘\ ' -. ...~ ..I ., M ”I V. I ‘A D ...,' V J P.vd . I. fin- “*I‘ K_ A)" A v- i 'LMLI‘I ‘ o I ~.' ‘ H 'C -. "yv ‘_. ‘ . “~ : v “e“- C Q N, - ’0 t. I”; k V- ‘a ‘\ .._ fl “ -\ “3r ‘~ ["7 ‘ G ,I. -"~4 '? \a u 1.1 Applications of Computer Vision Computers are becoming part of our daily life, and interaction with them has become a common and natural task for people. The field of Human-Computer Interaction (HCI) studies issues that have emerged with the use of computers, and how to make the interaction as seamless as possible. Mainly, HCI deals with how to make better mechanical input devices, how to organize displays, how to organize information and dialogs with the user, and similar issues. Some effort has been made in using natural means of communication (e.g., vision and /or speech) to enhance HCI. This was used mainly for interfaces for handicapped users, and these input devices were cumbersome and hard to use. Recently, more attention has been devoted to the use of vision in the interaction With computers. The goal is to be able to understand the user better and to offer a more friendly and natural working environment. The use of vision offers the ability to See the user’s face, understand the user’s mood, enable the user to look somewhere and issue a command using a facial expression, etc. 1.2 Problem Statement and Motivation Problem Statement: The goal of this research has been to develop a vision-based HCI that observes the user and understands the user’s gaze and facial expressions as a means of communication with the computer. The computer vision side of the Work involves the general capability to detect and track a human face as it moves In a 3D workspace. The use of such a system as an HCI would be twofold: (i) to V . D V ,Iv'j‘ 1' l‘ ‘ V ILJIA‘I Ihi ‘ L | 1 a I’flrl'h-HWI '3' i. “QL‘IrdH “‘1 ‘ A“ . . .. ._,. 3', ”flu. ' l '1 3' "V“. 'k‘ ‘l' ,‘I 1.2;...“ ml ~~ -- ' I -. ,n v,‘ 0 r r " ‘ ‘ J. .hd A rt Y‘l -... . '1 ‘ Y .V w’.‘ ‘Yl A‘.‘..rA.\ d‘ I-.I "‘W'h -v y. - ~ \ A, ‘ l .1...u‘.1..uit A. . , ‘ b.» , 3 ,\\,v'r,«' “ H.“ -.uu.. “-8 ‘V-v n, ‘1 i P I y .‘ , ‘1. jar 1} 5,1 ." a ‘ \ ‘ I \ I “i“ ' ' c m. ,(‘r‘ )f: I u- ‘ \ \‘vI-V' I o‘ V . -. I“. .L \ I) L ‘1 ’I , . .g ,,_ L'.‘ \ «a, a ‘ . -¥L.I I‘ “~ 4G; i‘ . ”Rn“. . \ I' I F" ‘u;‘ v s . \‘u I i.‘ b V'" “L v-. , ‘ . “.1“. ‘v , ”H K . .0 b' 1.. V " b'r ' ‘..F v . provide an input interface, and (ii) to enable an observation of user’s movements in a 3D workspace and an evaluation of how humans explore computer displays or virtual environments. Imagine the following scenario: user enters a room, walks to a computer display, the computer recognizes who the user is and automatically opens the user’s workspace, the user looks at an icon, signals with a facial expression to open it and the desired program starts up. This scenario has all the actions from today’s state-of—the-art systems, with a twist that there is no username/ password typing, and no mouse pointing and clicking. All this would be done through the use of computer vision systems. The key point in such systems is that they should be non-intrusive and not require a special set-up for the user. 1 - 2 . 1 Human-Computer Interaction In the early days of computing, computers were “magical” machines, “don’t touch” Systems. They were operated by a few trained people in white coats and kept in a StriCt laboratory environment. There was not much room for any friendly interaction With the machines, and these issues were not on the agenda at all. Even though people Were developing artificial intelligence programs, these were not used to enhance the lutefaction with the machine, but rather to create a machine that will reason and understand scenes as humans do. a" . R u . Hr :El’ll‘tt‘i'lx it" 7""1r‘I'V‘ “ NV." I'uu .n. ‘ I r Ia- .. null «l. "4; ’ ',".‘ ll! :’ i «v v' ', oY‘n '1‘ ' s. \‘ . ,m‘r . u: l ‘ l ,. p ‘ ‘ ’ i '7. ‘i' A . I A ‘ L mu; . - A.» ..‘ a. A V ' N’s . I i‘.‘ v” _ 3;: a, .l? .'......... v». 9 1;. » .L‘ t: a . . -r‘ _A ‘h~u ¢Aau A A. l A v‘y-rv.“ ,‘ l 0‘ I. . ." h“, ‘l-l I.‘ 5.0 A b . V“ r. ' v \ 1 r. r >~ AM M” L ‘ . l" - l'i “' " 1. v I i ll‘ ..NJ‘ ,‘- A... t K. .' _‘ y. , " 7'7" ,-' i...» l w"' 1 “V“... ..‘ ‘ . Wyn _ ' v '7 v ‘ A 'I “'f‘ .. ,“F \H. ‘ t r '. , ’ .1,“ ‘ P y l‘ T“ ‘ H ‘ n~ ; _ . . £ ‘ 3pr i _n- w; . A“ -’AA. ‘l l ‘ ‘. - ‘ RV‘,‘ G, ‘r.. ‘4 ;¢_‘. A n; ”'1 t .. ( “are"wv » I ' ‘§‘ “ I r ' l I l '. . ‘i H." . ‘u w ‘V'v‘0. J an ‘. ‘71» '. “av" \( 'V -' v " ”.41" ._A J' . 1 , ‘4“ 4'L I . a. Ipf,.' ' 1.;“l The increased use of personal computers (PCs) brought more personal interaction and the machines were more accessible to the individual user. Since the users were not Only trained experts, but also non-experts and even little childern, the interfaces had to be made simpler and easier to use. Still, the users had to spend time in learning how to use certain input devices. People had a lot of problems in coordinating mouse usage at the beginning. Moreover, in a typical computer, two devices are needed for input of data and the results are displayed on the third device: keyboard and mouse for input, monitor for display. This leads to a divided attention of the user, and a constant need to switch from one to another. Ubiquitous computing [88, 87] tries to make computers unnoticeable and part of the environment. However, the input is still done through writing on various pads. Computers are smaller, and the communication with host computers is unnoticeable. Still, there is no effort to observe the user using computer vision techniques. Apart from ubiquitous computing, new input interfaces have been developed. The use of voice recognition for input [85] is now commonplace. Gesture recognition is also used as an input interface [64, 29, 20]. Finally, head and eye movements are used as a replacement for the conventional mouse [48, 91, 75, 79, 2, 100, 73, 46, 81, 49, 6’ 3, 4, 1]. Many such systems were developed to be used primarily by handicapped users [40, 49, 6, 3, 4, 1]. The advantage of handless HCI for such users is obvious. kiost of those systems require special set-up and / or special hardware, which can be both cumbersome and expensive. The use of infrared cameras is typical. However, there are health concerns with their usage, such as whether infrared light beamed at 4 c'»;v’ '}' “dunkih “H" ‘ n I V . w . 4* ~r' r‘ p ' ‘ [ -1 AM‘ “A .A-A I l ‘ 7. n o o . v r ' r . ..... y .u .u . ' l N Y‘ tn 0 7 F L ‘r ‘ .. ~ 4 .1. PVV I. , «,.. ““‘l ‘.\ .- . x [ v'. ..H ‘~.;‘ - - .. . \ ,1 . 9—? _ . . P". 7 I A‘. the eyes damages eye tissue. The use of various helmets might be rather unpleasant. Mostly, the user’s movements are constrained to a very limited space. Recently, more emphasis has been placed on developing a handless HCI for a general population [75, 79, 2]. This work is directed toward developing a more user- friendly interface that would involve interaction with computers that is very different from the interaction we know today, and very similar to the scenario at the beginning of the Section. Extensive use of computer vision algorithms would lead to healthier, less cumbersome and cheaper systems that could have many practical uses. This kind of interface would offer more freedom to the user—there would be no need to con- stantly move one hand from keyboard to the mouse, thus all the attention is devoted to the keyboard. Additionally, we would exploit the user’s natural behaviour—the user looks where he or she wants to work and that should be naturally captured by an HCI system. All the above systems rely heavily on the use of various computer vision algo- r ithms for analysis of faces. Individual applications like face recognition [82, 83, 78], gESture recognition [26, 92], face-spotting in images [69, 59, 42, 96], gaze detection and HCI [58, 10, 8], are all used to create the new HCI. 1 ~ 2 .2 Cognitive Aspects In Cognitive science the field of visual perception has been studied extensively. Vi— sual perception is considered to have a key role in our cognition. By studying eye [Image is presented in colon] Figure 1.1: Purkinje pupil tracker, an illustration (Courtesy of John M. Henderson, Michigan State University Eyetracking Laboratory, http : i/eyelab.msu.edu/) movements, we can conclude which parts of a scene are viewed first, to which parts We devote more attention, what we notice and what we do not notice [38, 37]. To measure a subject’s eye movement, a “classic” Purkinje pupil tracker is used. This device is very accurate and allows excellent measurements to be taken. The P1" Oblem with it is that the measuring device is intrusive and very hard to use, and the user does not behave naturally. During measurements, the user’s head is fixed within a. Wire frame and the user needs a bite bar to reduce head movements (see Figure 1.1 for illustration). Calibration of the system is needed before each measurement starts, and Often also during the measurement session. A great advantage would be to have a completely non-intrusive eye tracker that would allow measurements to be taken while the user is naturally working in a 3D 6 £33,179. llu‘ ln'ml'ti' lush-v . l : n . .~ 33 .hl‘ \ml’.’ in.“ If A” Lu. J“ ‘ l ‘ ' . . 1" v ‘p' 3". Ali {R 141'”) “I‘M ‘. . l “r " 197w: 12.3 Computing ( ' v I' U‘ “y ‘,_ Inviuvy' v . i .|.. -n v M.“ ~.\ “1‘“!H‘“ ’l' ... I . ' n l\r‘ ‘ ’:'.. y . Y . «A ". l . . . _ “ “A“ H.‘ ‘ L ”O ‘ 'P t' I! V J ' P v «.p ; l> ' _1 . A r.,\.. :1. A“‘!“ II‘ in ‘ i _ q _ ‘ ‘P "4".“ vy l Iv . . l H 4 I .4 .. .U.|" (l‘. l u.~ n . >yo. ‘J‘N «x:, N :I-‘WPrQ,¢‘- : ' 5"“w|”h U a '; -4, ' ‘ .2“ n..l,: ,. \. . I at ' lzsvlp , a pp... 5" workspace. The head-eye input interface described in this dissertation would facili- tate this. More importantly, the interface could be “hidden” from the user and the user would behave naturally, without the burden of knowing that an experiment is underway. 1.2.3 Computing Challenges From the computing point of view, the task of detecting and tracking a human face has many challenges. The aim is not to analyze a general scene, but many aspects of that problem need to be covered. Many forms and variations in human faces need to be covered. Finally, all the algorithms need to work fast to keep pace with the user’s movements. Segmentation of a face: Distinguishing a face in an arbitrary scene is such an easy task for humans. For a computer vision algorithm, though, this is not trivial. Issues such as skin color, face orientation, and scale play a very important role in complicating the process of face segmentation. Algorithms for face recognition usually aISSume fixed scale and orientation, and they deal with the problem of recognizing the face from a grayscale image only [82, 62]. Algorithms for face spotting in an arbitrary image try to deal with the above problems, and to locate a face in a single image. Some algorithms use grayscale input images [69], while others work with color images [96, 59, 42]. Problems that persist are finding dark faces and covering all the POSSible face orientations. In all cases, the execution time of these algorithms is not 7 V 1 . , . 5;" to {viii-Hill" ‘ iii \ - , I ~ ‘ ii : [if] \\]\ ll] ‘1. ‘ “A" .Ui in. . . . - . q . , . v r-- .n _ "1""? Mail in" [ii ' u ' A ‘ ' ' .... n t 1,, , “Jr 1 ‘ . [glue .3?!” i‘ “ .... u. r in ‘ .1, , H 25»... i ‘u u' 5\ l. “:1. r l . _I I H ‘n' v t 4 o ‘ iizl¢l4 "g‘ MI, .. r - . i i . 0.*~ Iv ‘Qv ‘vLH r~- v ‘ .. ,, . .».u-.‘ l. Fiii.1.;i.~1'i. ‘\ |:.' ‘r-v‘ 9~‘:-~y4‘ ‘: , ‘ '3‘ b ‘ ‘ '7 " ' ._:.t. .L. arm: ., .t .1 l “we 0 ‘- _ ‘ P n“ ’I l iy v - V “1’" mm “ALL. tildi v. . ' v I». ’ ‘ p, _ \ ~ . “blink im‘n ll ]5 is". ;r -...‘ l - [l 1 °‘.V‘v_v .. ‘ I... ~31“ ‘ruip'ld f‘\ 9 I f 0 I ‘A. A o -_ “WK .1 iv. . ]‘ ..' ‘1 ’i' . ““KW‘ olLi“ ‘I nrl‘ ‘ «‘4‘». g‘..“ ‘1 T :‘V‘ “U' ' r A ‘ x ‘ u ‘ 7. O . ‘“ d‘H‘u“ .\ l‘ a ‘__.r . . n. ... .1 f t. q . “A.“E EV“— ii ‘10 V. u' N- .‘ fi‘ M ‘ '- 3“".~ Iri’i- ,. - hi...‘ dwi'u .'_' ILL" . -. “4.3-‘1'”? pk; _] “‘7 Li 5]. [t h .‘ Uf.v"rr‘ 'i 41‘1"“ i. '7‘ | *‘Iv ‘ . , I 9. :2“ r“- ‘ ' Bikn i'rQ: 7'? ‘Q‘ A.‘ ‘ t v .H v. 7* KY 3... LrA/‘ »; .‘ ‘44.. .~ [‘7 , . a“: ‘U {”1 ' I~G‘["‘ . , ‘l(;.“ v. I - ‘ L‘l Np." «an; i. 5"] r.” '9 i “d.. 4' 9 aDI' ‘ '4“"‘ 41H (I - ' “We, t n‘,‘..,v,.‘. ‘ ’Hk_ even close to real-time (e.g., 30 processed frames per second), thus making them not suitable for analysis of live video. To overcome this problem for analysis of live video streams, the most common technique used is segmentation using a skin color model [59, 42, 96, 8]. A significant assumption here is that there is a face in the input image. Finding selected face features: Locating the eyes, nose, and mouth in a face is relatively straightforward to achieve once the face has been extracted. We can use templates for desired features (e.g., average eye, nose and mouth), perform brute—force template matching and obtain our feature locations. The major problem with this approach is that it is very time-consuming, it would assume fixed—scale input images, and fixed templates that might not cover all the variance in the appearances of the features. Additionally, a change in lighting conditions could result in poor results. By using a slightly different approach—not looking for each feature in isolation, but finding these features together since they have fixed natural relationships would help greatly. In this work, the geometry of the face and various heuristics are exploited to overcome the slow template-matching approach. Determining user’s gaze: When we see a face, we can easily determine the person’s gaze. Even infants can distinguish whether someone is looking them in the eyes or not. For computer Vision algorithms, many assumptions need to be fulfilled in order to Calculate the gaze. Pupil trackers based on infrared lighting use the Purkinje reflections to calculate the gaze (Figure 2.2). As was discussed in Section 1.2.2, they need extensive calibration. Another possibility is to calculate the gaze from face features’ positions. Algorithms like Pose-From-Three-Points (P3P) [58, 10] proposed 8 + wf‘jIITb‘. Till.‘ ("W V I _ l. .501]! [~‘ 'iV’FT‘iliJ (”in ‘1‘”? " U ‘ i . t [ ‘ H ~ 0 v r, . s mugs ,t, ...,. .i‘ A. A‘ u‘ . . . . . n \.... 7"371 l ‘9v1'1V I v ..' h ‘l' “Uh Au ~~.A.J\AAA A |l D . . . “713: Us] I“ ‘6 4! n ' .41.». y ' p .....;;,.5, “t" lJX;1i"“ Hi]: to t I" ‘ I ‘ 7‘ i ‘7 h V ., l‘ I l J [ ‘ ’ sl..,'f....,dl\\.|,? ”I“, . ‘ ‘ 7rr~~~ v- ~- -gmiu Adi.‘.\1".... ' H... L *1 fi‘ 0‘ l") . 7v .0 - »L. m hi, [if I"“‘ . ’- . ~. ‘ '1 at L;."“ -..,. "l~'1“y\‘.-[‘]'d ‘. 9“],‘.r‘ [.1 ‘~A.A _ 4‘“ ‘1.- l ”‘3.“ wail": TI. .. , A AA.J~ ‘."I ..‘I If ' h E a ‘\ 5‘” L. A‘” .‘t L .V i o ‘ ' ‘~ if' r. ~ . w. 1‘.“ f’\\ L.» a mathematical solution, however, we need to know the user’s dimensions and camera characteristics. This could lead to problems with the portability of the system, and the need to ask different users to enter their dimensions prior to the use of the system. A solution to this problem is to use a non-linear approach based on an Artificial Neural Network in conjunction with an approach used in the conventional mouse. Recognizing face erpressions: A wide variety of facial expressions is present in humans. We express our feelings through them and they are key attributes in human- to-human interaction. For a computer system to recognize all possible expressions would be impossible since it is hard to enumerate all possibilities and variations. Thus, in an artificial system, we need to concentrate on a few expressions and work only with them. The recognition can be based on neural networks [7], Markov models [59], or nearest-neighbours classification. In this work, an approach derived from our facial features finding method is used, as will be described in Chapter 7. Fast execution: An important issue in designing an input interface is ability to have fast updates. The algorithms described above need to analyze a video stream, at a rate of at least 15 frames per second. That leaves about 66 milliseconds on average for processing of one image. Necessary operations are face segmentation and facial feature finding. A segmentation algorithm is time consuming due to the need to Classify each pixel and to use a connected components algorithm for the grouping of pixels. Feature finding using template matching would be too expensive for the real- time needs. A solution is to use either smaller images, thus reducing the necessary Work, but also the accuracy of feature detection, or to use the most time consuming algorithm sparsely. 1.3 Preposee ‘ '\‘ . ' ' v v . "v .. -L ‘ » "w ‘- l- ”I .‘jl.i(‘n......- u! “I ' . . ' ..2, [1m .. ""wyr. ' ; - '.F...J..r.tz.i;'.1, l . ‘ ~ ‘ ’ V I .. » A "T 1' ‘ ' Ii “(L‘It‘ ‘11“ [h ~ x. P | . . . . ‘ ¢,."4F ‘0 v.“ . “‘ . bun-5‘" . ‘. . . . ' H ' 'P' n o ‘ v ‘ . ‘ u. n. .J’ .ulf 1“ .‘ U. Q 2 ' v A‘ ‘ ‘. '1 .. e we a"... I » iil'..'.,[ . N in l'l-('\ .J‘ 1'”: i‘lrl -‘ . ,A - I “A“ "' i. . 5m”. I“. ‘i- H \ ‘5 l 1., in... .~ ~ ‘;_f‘] 'v m; 1.3 Proposed Gaze Tracking System Organization In the remaining Chapters, the methods for face tracking, facial feature location, gaze determination, HCI building and evaluation will be presented. They are all parts of or based on the gaze tracking system we developed. Figure 1.2 depicts the block diagram of the system. The input is the color image stream from a video camera. Then, the face is located, and face coordinates are used to extract the face sub-image and further analyze it and locate facial features. Currently located features are eyes, eyebrows and nose. In addition, based on the eyes and nose location, we determine whether the mouth is opened or closed. Finally, we transform the facial features location and their movement into the gaze point coordinates on the screen. Gaze point, as we will use this term, is defined in Chapter 5. The gaze coordinates can be used in several ways: determining fixation points on the screen and their duration, measuring user attention to the display, controlling the cursor location and using the mouth state to send selection signals (clicks) to the underlying windows interface. 1.4 Organization of the Dissertation The remainder of this dissertation is organized as follows. Chapter 2 reviews background information and related work. The work on face segmentation and facial feature location is reviewed. The Kalman filter is introduced and its applications are briefly discussed. Several approaches to gaze direction de- tection are discussed. The Purkinje pupil tracker is introduced and its application 10 CO'C' 33:2 TRACKING SYS’ —.Q '31 Video camera color image stream —-----------—-—----—----—-—-—-—--------—--—q : GAZE TRACKING SYSTEM Face Iocahzafion face boundaries Facial features locaflon I I I I I I I l l I I I I I I | I I I coordinates of eyes, : eyebrows and nose Mouth state I I I I I I I I I I I I I I I I I I I l I I determination Gaze point determination mouth state: opened or closed gaze point: (x,y) coord. normalized to O...1 V Attention measurement Cursor control & object selection Fixation determination fixation locations time intervals of the screen cursor (x,y) coords. on the screen and attention to the screen and selection commands their duration and away of the screen (eg. like mouse clicks) Figure 1.2: Gaze tracking system block diagram 11 '~ 'rr dim-um .].-: "Vi 1‘ i " ' 1 - i vrr‘i ‘l “\"'I twill“; did; in“ -.. 1' I‘m 4 . .-A h- } l _ , .. 'i‘- f ‘nl‘ 5. L...,»i..l'ii. cm“ 1 9.: at: di~l"‘.\\l“l p- la m .1 ' ' , I,. ‘7 4 ~rI. . lif‘fllu‘ HI ') l‘f“l 5" H ‘ ‘1 \ I , . a | ".‘Nv' O - i: .I‘ .Jv..;..'d‘.4."§. 1" (IIA‘HH .,., [-n 'Y‘I‘ ,. n . , , ; - \r‘. ‘ .J‘. . hAaaroALlj\ i] d} li‘ i ' ' \ "W“VV'I f; ‘ '... . ¥;~T...Gu‘.ln [11"‘5 \ h g"! s . 'il l.;,. . f IJ mu [hull It. ‘I.'._‘.‘,,_ .,.._ i ‘..h ....' V H ¥'-' ' ‘ s r W»! «h .1 . 5 r .. '>_ ”i t .f ‘ x. .J‘ [“I :9. " ..'I n. . ..-1 , :‘ II ‘b \g-a l . "' . “VJ. :[p\_ll ,\5. ' . . . A ‘ '1. I“ 9-,. . A r‘ ‘ “at. t’ 4 l ' f " 1i;t\l"“.g.\ l ‘ A.1_ .4 r.; . L [N‘h- a.“ '.‘ 'Rhio. ' “A .I.A’J,,:\ ]v “" i.’ . .w s; " ' ' -. .1; us .. D v. . H 94"} 9 ] ‘ .. q .Q; r- . t ., x .H [Jr V), Ni“- .. ‘ L . .i' o i q‘ f‘- “. “fialg :'v.. V _ ‘J \Fr'f‘ 7' A A‘ ‘. l" .4333}? T l‘ .I J , U.“\l-' s , \ ."fTJ‘" h“§.('“}.b\‘ ' 4: ‘ ”192m ' V - [1“]: I 12‘ I l 't.‘ ' . ZJ‘ ' ‘G' , . m I“ . 5] v to gaze direction detection are discussed. Several commercial eye-mouse systems are reviewed and discussed. Finally, HCI-related issues in the creation of user interfaces are introduced, and findings of several studies that compare conventional and eye mouse are discussed. Chapter 3 describes a skin color model for face detection and a method for locating facial features. Its advantages and drawbacks are discussed. A number of successful face segmentations is presented, as well as a number of unsuccessful and problematic segmentation cases. A method for detecting a dark face and enhancing the colors so that the skin color model can be used is presented. Finally, some suggestions on how to overcome the drawbacks are made. Once the face is located, a method for locating eyes, eyebrows and nose on a face is presented. The method is based on the knowledge of the geometry of the face and uses several heuristics. The parameters for evaluation of eyes-nose matches are introduced. Finally, evaluations of the accuracy of the feature location are presented. Chapter 4 describes how the features are tracked in time and discusses the accuracy and timing results. The system state diagram is presented and it is discussed when certain parts of the overall algorithm are executed. The usage of a motion detection algorithm is presented and it is discussed how it can be used to overcome some DFOblems in the tracking. It is described how the Kalman filter is used to smooth out the tracking. Several results of successful smoothing using the filter are presented. Chapter 5 presents a method for detecting user’s gaze direction based on the COordinates of eyes and nose in the image. A neural network used to map user’s facial feature location to screen coordinates is described, as well as a method used for its 12 I ' ' ' . l 1: v v [11; _ 'Y."‘.'L illf null-\[iwhx "ain‘t-'3‘ A L. I ] ~. i9< 'f | 9! ‘~'Ir.’“§il..5 U ‘l..'"" . l r in gn. ‘v‘. '1‘ l '0‘ “if in.“ iii. A“, nlii ‘ n c I] ‘ - in »n ‘r' ] r , ., , ‘> ] . a‘ \ Qr‘rlcl Punt.“ ~ li‘ .' ‘, [m [a J P 7'] 74"; 1' O ‘t-l1,.'.f_',l"\t,“\l‘l , ‘ s ' r . "‘ " 0 o , vw 'l r. ._ . .A 4....f‘lfljf‘ [5 1]] [1‘] m. I a. r . .. - : I . . . . 4 ‘ “my-ff . \‘L5i ‘il“' -‘ ‘. . 'V‘Nv . "F ‘ “4 y" g “1‘ W' 7"“, . . - ‘N‘. ‘ it \ F]. "1!" ”Apr: "r ' W ‘ I i :J- A““‘a(:‘f‘. lI‘s b 55:97?! T F )2)- > fl 0'» r. . "A "e 1.,“ ib‘ ‘[ 4' P', " t" V’Q_. ”*- l’w . e ‘ “~ L5H} I i, r .u. ‘u‘ k v. . i "r I .. ,,. ' ‘ ‘1‘“? ("v ..I ‘ ' - .‘JH. _ . c g A ' u;' ‘1 PW"? . ..A A‘ff {117;‘;.,. h . .3. If] \ ' C training. The mapping of motion of features in the image to the motion of the screen cursor is described. The joint use of the above two mappings is discussed, as well as the results of smoothing of the output, which provides a framework for further development. The gaze tracking is applied to fixation determination and attention measurement. The results of preliminary studies are presented. Chapter 6 presents development of an HCI based on the head-eye input interface and gaze tracking. Ways to make a. handless selection are discussed along with some issues related to GUI design. Finally, the evaluation procedure for a head-eye input based interface is preposed. Chapter 7 discusses several applications of the system described in this dissertation and presents the results of an evaluation and usability study of our proposed head- eye input interface. Tasks such as moving the cursor along a defined path, button selection, dragging of icons, and the use of general applications (e.g., Netscape and mail) were tested. The results for two user sessions are compared and some concluding remarks on the GUI design and interface usability and learnability are made. Chapter 8 concludes the dissertation with a summary of the major contributions and future directions towards the improvement of the preposed interface. 13 Chapter 2 Backgroul‘ ' I ‘ n'hv" V . -r ”-7 " " " ‘ J. u.-i~ Lllu,- “A. \‘1*‘ “ . ' F P ,‘-;-.y« o v- ' ' [I] “r $.le \ ‘. _I‘a ’ I . I ~ ”LN-p “at“? or v' ‘ “in-Kw. D. .d‘. .04 P)-bi‘. “ A , 2-, r. ‘ h . , . {L1 «at. 1C" ' (1:.‘i .. _ ' ' ‘ . "M l I ' ‘ . ]‘ ‘ “ . y. '- . a . u... ‘R I; h (‘5 .w - ’ ‘. 5v . v . . TI Ff)“ ' ‘ ‘ . V‘l" ‘ V y r ‘ . Ltwfdiii “ ‘ U ‘u “v.4. ° ".4 :‘ A..- k”, .3! f’\ ‘n ' T. ‘ » ‘ COX (1" ‘I , .\.. ‘ i‘ \ h, '1.“ W‘ a] 4 I \J _u" (’4 l'. ‘ ~0q..= LEN-Jr ‘4“ :D‘;9;‘ ‘4 l i ’N ' V i .L ' I. drd. ‘- la 1;".- "1‘." :J'tv- . * u.) '{V‘ ‘ , U. Jeff Chapter 2 Background and Related Work In this Chapter, various background materials are discussed. The first three Sections review the topics of face localization in images, locating facial features, and discrimi- nation of facial expressions. In Section 2.4, gaze direction determination is discussed. Section 2.5 introduces the Kalman filter and reviews the tepic of tracking of objects, in particular, faces and facial features. Human-Computer Interaction (HCI) issues relevant to this work are presented in Section 2.6. Finally, in Section 2.7, visual perception research is discussed. 2.1 Face Localization Finding faces in an arbitrary image has been studied extensively. The problem is very difficult due to the complexity of the human face, within-class variations in faces, and possible rotations and size of the face. To cover all races, skin colors and shapes in a single approach is very hard. In this Section we review three paths that can be 14 p‘: ..| .n o n! ,1. i l du>uufdg. I g I i ' " ’ a ‘ w 3;an 3.10;" at I. dim l... 2.1.1 C lusterinv. ¢~r~ cw . . ‘. ~\n ,q,uo,‘r..f. . .. «u. ,._ . l l .,... .,..[].. . ~ .“u. ‘L itch“ . ”I “'1. . 1 Y ‘ "vw .. ‘ . ' -1 ‘ v , and}? basin, 5“ ,"‘ ‘ . H \l‘."' ~11 1 [ll . ‘ —- .‘...'T Lin" 1 “. I I I , "y‘r? -. ‘ .2 . I' «fit .f’if‘“ [Lt it 1“ o. v ‘ Q "raj”; I :'\I. ‘1" .. .- “fix“? K.‘ 1' .II 1?. -.._‘ . . -'.“\ ,r. , ] ‘~ “ *d‘h r!;~'.r 1“ ‘i-wlg _ ~,_ m, . ' ..,nn,a;r P1..,_' ' ‘ ..\ ' . ’r ,3 ‘ .,, . ""H, art 1'. ‘1' -- ' I A.l‘A ] - ‘1 ‘df" ‘ l- ];N““' i k «WV '1’ (3,,” ‘n 4 I. ~.cQ‘>‘rV T)! I.' . \ . I .H 'i-(rr. ~.. H], _ “u3r€: :1»:r.7’ 1 “1 ,yv fr In ., ] \‘e. s. h- |‘-_ ' V . “[7] G e] , ‘ ‘As lei“ (3“? .. I‘m-«f a] \* ; ‘ . “in 3’ ‘a a ] r v- . ‘ . i“ ‘, . 5... s.“ on! .‘k M . “" lf‘TY-r, . .‘Akjtfl .A\ u g . x. N ~Id .] ,.. - ‘ ut le. ‘-’11"c t “~~ I) O .0 GEHv , . A“, QPF’I -]4 i?‘ a LN: / Hi I . ‘1... a’Ja._ .‘ L1,] ;" ..,r] H‘. ' ', r14..‘( 9,} :' L» i \_.P - «~sz ‘4 I I J V lka‘ .P ' followed: clustering, PCA-, and rules-based approaches, artificial neural network- based approach and skin color-based approach. 2.1.1 Clustering, PCA and Rule Based Approaches In this Section we describe several face location systems based on “traditional” pattern recognition and AI techniques like clustering, principal components analysis (PCA), and rule-based systems. Sung and Peggio [77] designed a face location system based on clustering. In the first level, they determine whether a face exists in each subwindow by classifying the subwindows into one of six face or six non-face clusters. They use two distance metrics to each cluster: “partial” distance between test pattern and the cluster’s 75 most significant eigenvectors, and Euclidean distance between the test pattern and its projection in the 75-dimensional subspace. In the second level, a neural network is used to classify points using the two distances to each of the clusters. They used 10 times more negative than positive face examples, collected in a “bootstrap” manner, thus reducing the explicit number of negative examples. Sakai et al. [72] proposed a system based on line extraction and matching lines to predefined templates. They had templates for the face, eyes, nose and mouth, and allowed the scaling of templates to be able to locate faces of unknown sizes. Govindaraju et al. [33] proposed a method for face location based on face boundary detection and semantic networks. From an edge-image they locate curves using the generalized Hough transform. Curves that are likely to be face sides, hair line, or 15 ;,'.,. Fm, an) ElliIIPHi ‘1‘ ,,‘..I ml. -’,3 ‘ "1‘ ,- - .0 ‘ h" ennui d >uu....: .. 'l . . , .. , {£12 f'ftili “flair fun .I'. .§“T* 7 " t“. .fli, I “1,1,. . ".InAh J... . "', r rTTI'rf.I.'a.lll"> i..' u: ' 0 \.'If '21:} HHj' .«v I" h us...m .4 Ant ‘1‘ H .3 h I]? r““"‘ f. \ Rt ‘. .' ' . ‘. H, 4. .a... Mimi. {,1 ’Fi'Mu‘MVr .;. z .. ...-...ul..l_)ul \\lui"".l,\ , 'wdii‘flnftlr 7‘ — . ' 4‘. ,: ) .~ 31] Hf [fl ' ' I‘}\I ' ‘ ”‘4 ”is . i I ~u I ~. . Lrif‘f‘ ti] 7“.” f?“ X".\- "K H - \ ‘ ‘ , ‘1'“; . ‘ " "‘l'if‘r ‘ ,I ‘-.‘.." ",3 “r“ “J ' h. .| ‘. .zid.‘ \lll'YP _ ' ‘ I "A! .; ”‘- 4 mil I]; . ‘ a y‘i]§ TV; “ , T "' ‘.I. - O_ .t. "f”. ‘.' A “:l \ ‘e. « ~v-v. Inf /]-‘,~V;T; Hui (I Y , A. a. (A [1 U1... r ‘. rJ du'i p: 1‘ . “carp; 1. .. . I" - J , ‘9'. . I: (“K‘ ‘7“ §.] ? 1 4.1:] T,“ . i ‘ ' 'I.;|" . d :'l;\\ “‘1 . N-Cfd' r v‘1(o\,.. m I 1 L‘ \"|‘ i‘ d. chin line are grouped using weighted voting. In a follow-up work, Govindaraju et al. [34] applied a similar method. They found the edge—image and used thinning to obtain candidate curvature lines. The lines were grouped in a sparse graph, where the weight of the connections between the nodes represented the degree of similarity between features. The face was found as the biggest clique in the graph. Yang and Huang [93] manually coded rules and feature detectors for face detection. In the first phase, simple rules are applied to the high-resolution windows in order to find candidate regions. In the second phase, more complex rules are used in the lower-resolution windows. Finally, in the third phase, edge-based features are used to classify a full-resolution window as face or non-face. 2.1.2 Artificial Neural Network Based Approach The introduction of Rosenblatt’s perceptron [68] in late 1950’s resulted in a lot of research that tried to mimic human visual perception through Artificial Neural Net- works (ANN). However, Minsky and Papert’s proof [52], in the late 1960’s, that per- ceptrons could solve only limited problems, resulted in a pause in the use of ANNs. In the mid 1980’s, the backprOpagation training algorithm [71] revived the use of ANNS in Pattern recognition and artificial intelligence (AI) problems. Burel and Carel [15] designed a 2-layer neural network that classified input as face or background. The input to the network was a 15 x 20 window, and the output was a ClaSSification result. Subwindows of an input image were presented at various scales 16 t'.L' ., 'fi“ » w‘p Ill uo—o 11:30.7de KM" T f 1 ~- u azzzr 1m ' -‘ I . ”I {F Yl‘ I .A \t t. h v . 1- ‘ _ . mahlf; a'i d I n 7 \ . In t f , ‘ ‘ [0..1.“ ' t ‘1; v . ,' “Ill H‘ v < .[. aal.‘ 1“! (I l "" r , l. p . .J ".- ,2? 7‘ in .‘ t u L. ‘ .. a ‘ rir r ‘ '5» (3. { ‘. ‘ “3 J1 l.‘ luxl' .. 3.13;, . .tn '7- 444 _‘ d‘ ‘1’“! : ‘ b . "1"”; "1"». . ... 51‘4 . ' ‘ «Mum. V. v. Q. ‘>~ -\‘_‘v. ““34 . r.“ Giiglhfa? '1'. h. ‘ ‘ u, A. lvu. A“ x 'p, P ‘s' fl”... '9 Ik ra. . r d”‘\‘, I» . ,‘ r‘ 75.7. _ ‘ v 'n. ‘3“ “in, . «1‘..}“ Li a r it .l‘ ‘ ,._ ‘ in: .L; r ~ l4 , l‘“ “If“:‘T' v 9‘ :‘ ‘o', , .3 . 4.... a” :3 ‘n v'. t {UF’ ._ ‘ lir,‘ ‘ A I .._ ‘ _. V‘ If? 9.. r ‘ “l..‘ri.’ ‘ i to recognize faces of different sizes. Training and testing data were obtained from the video recording of a conference room. Vaillant et al. [84] designed a system that used two layers of networks. The first layer produces a response for face images, while the second layer reacts to centered face images only, thus locating the face more precisely. One possible problem with the system is that it was trained mainly with positive face examples, and only a small number of non-face images was used. Important results were obtained by Rowley et al. [69] at CMU. They achieved 80—90% accuracy in face location using ANNS. Their network was trained to classify a 20 X 20 window as face or non-face. Each 20 x 20 subwindow is presented to the network, at the original scale and at reduced scales. A key point in the training of the network is to add non-face examples as the training progresses [77]. This reduces the size of the training data set, which can grow very quickly if negative examples are added at will. The network is trained to accept a certain degree of head rotation. The execution time of the program is not suitable for real-time applications since it is about 24 seconds per 320 x 240 input image. In their follow-up work, Rowley et al. [70] designed a rotation-invariant face detection system. The first stage of their system is a router network that detects whether the input window represents a rotated face. The router network assumes that the candidate window is a face, and whether the inPUt window is a face or not is resolved at a later stage of processing. If the face is rotated, the network detects the rotation angle, and the window is de-rotated so that the face is in the upright position, and the network from [69] is used to classify it as face 0T non-face. The accuracy of the new system is 79.6%, and the processing time 17 [7' at in m 17H ,_ ,. c—AU L‘ . 18.5: r. 12h 111;]: {Hz li‘. ., f "N. {'n‘) of (1.. .1 l . :.1 ”,i , .'.‘|1‘:.Indn Ill 'Jithl u I]! ihl. 2.1.3 Skin Colt - .u‘ . . ‘ >,h fr rvy [I1], ".1 ‘v .. t “42““ t]. _ k.1 .‘i ‘\ A , ,' ’l ‘- ‘. .m 3‘, .,, ,, ..... ".4. . A. 1.” (A... ' D .. . _, l_ ‘ T 7r 4f 5. '1‘ ' -~. ma. rutait'dfi. ll. i’ I y 5.. .5 ~ «.«h. In aw --<‘W v Q‘ig4\‘r.' -l.' ' u.-» ' 4 4 hi" Yuri my ' I “tonal [‘ I: l" 1.. ,.. (1 5]“! t I i ..- . I." v a). a“? I' 1. (I? ”I, . 'I,‘ - ‘ I «tram... ll" '44. A. ,"r. h J? I: ~] ‘ W-..,. 1,.. , "‘ A I \ : ;; . _.‘ No. ' \l ‘ .. .~\ - 7 I "‘ * 1H 'Jll“ f " .14" '~."\ ‘7" ' 7‘, .a “v -M 11, iV-rv ‘ . w ... f - 4]).“_ :6 ., .. 3“ .- ‘ .‘ r _ -\ 3 K? «1:64 ll]. ‘ . i “P ‘ :‘l f I T‘— u,’ 1?". -.i.' r . . 5 M : ,.] v‘z~ \rdl‘E’v .7 a“; \‘ ‘ e- 1. . 3 .‘ .‘ Ljr I" , .Q‘ r IA for an 160 x 120 input image is about 6 seconds, measured on an SGI O2 workstation with an 174 MHz RIOOOO processor. Osuna et al. [60] trained a support vector machine (SVM) in a similar fashion as Rowley, and achieved comparable results. Their system provides new way of training polynomial, neural network or Radial Basis classifiers in a much faster manner. 2.1.3 Skin Color Based Approach The main problem with approaches described in Sections 2.1.1 and 2.1.2 is the execu— tion time. All the algorithms need to inspect many subimages in order to determine the face locations. In addition, these algorithms all work with grayscale images as the input. In the case of color images of faces, one important piece of information is added—the skin color. In the late 1990’s, tracking of faces in video streams is done mainly using a skin color model. In 1996, Yang and Waibel [96] presented a system for face tracking based on a skin color model. Human skin can be clustered into easily discriminable clusters, that can be modeled by one or more Gaussian distributions. A wide variety of skin colors easily fits in one cluster. The method for tracking is based on the classification of pixelS, grouping them into connected components or regions, and determining which 1regions are likely to be face. These results are further used by Stiefelhagen et al. [75] for a gaze tracking system, Oliver et al. [59] for tracking and facial expressions, Jebara 8t al. [42] for tracking and modeling, and Bala et al. [9] for face and eye tracking, to 18 reziu‘ri [115] a M." I" '3 'g" ‘. Ll” Ill lad...» 1., ‘ .[rn ‘u ~ln ‘ ‘ ,I ll \[ml TLD U « I " ‘ ..' I“ cl 9‘ 1:».2wilxbl3- 3 1' red. The no . l v'. ."r '1‘ v '. ‘. .i['l.f."I:L;(11rlr.'. .' V t l L". Hewitt \ , . .‘o --~;yyn1 (.L' V? ., ’ ~~ mu . u tfitd‘. .a....l1.. ,.,_l, . .' ‘ ' ‘ arr: with rich; al.: . I . . ._ . 1 . y " ”r '4 ‘ F ‘ \1 . y o , '_ ... _l¢AL‘.d‘ \IJ 2‘ 9 1w Al: O. ""flr-I ‘. '~ “ ”T ‘|.‘ “mac my a: ' ‘ A. O :31; m‘ .. .. * my ht} Id/llfi'w ' A-|‘ .“ ”.5. ~"ll,",‘["v:_ . “‘ J'" 3' "l ‘Q 15'}? , > I‘lldr.'\ 29.,» h" 5"“. I ‘ " ‘.>- , a. , .quf1\hl,n I G|“ mention just a few. In all these systems, real-time rates are achieved using low-end computers, in many cases PCs. Most skin color models are based on either RGB or HSI chrominance models. In the case of RGB, the plot of normalized red and green components (Rnorm = —R+g+B vs. Guam = 72%) is used, and in the HSI model, hue and saturation components are used. The model can be adapted to a particular skin, and Yang and Waibel [94] discussed this issue. A potential problem with a skin color model is that dark skin is not well covered by the model. This was noted in Terrillon and Akamatsu [80], who did a study of different chrominance models applied to the skin color clustering problem. The problem with dark skin is that low values of R, G and B lead to large fluctuations in their relative values, which results in significant noise. Another problem with all skin color models is that background pixels might be of color similar to the face color. For example, the walls could be beige, which is similar to Caucasian skin, or the subject could be wearng clothes that are of skin-like color. In such cases, a face detection algorithm may not be able to discriminate easily the face and the adjacent regions, and more processing may be needed to analyze the regions. 2.2 Facial Feature Finding Locating facial features in the face images has been studied extensively. A wide vari- ety 0f methods have been applied, and all of them have advantages and disadvantages 19 ‘ ‘ ~ or. _', ‘ \ cm {be m u'. m , . L"; l;[..‘ 'Q'VI . l. l ‘HM l ...,... a 1‘ r .1.;..‘il..l..L. | -uurTfl, y‘vlv'.l...l L1; Cm; Ll‘l'lul .II\ u. .u [“2" ‘v- .1» . ' .-“|n.>[)t.‘ nil t .,‘ Lama .y' , . 51’ "3 ‘f 1 O-wv L1,][a'hiont I’lau‘uu". m :3}. i y .. h [v "[“*""d (1.. ‘. U '. .. ' 1 W V; l . , ., I! .| A ,so. .‘ [Ekf\.“ ;‘ f.‘ . r - l.| 1" ‘ . .1 .‘.f_]‘ m“V‘* 'v A.. o T 9 ‘ “‘5‘ CW TIMI; .~:. . J . l "W“ -‘ _» . >r,J _ 4.”td‘.li““‘Vv " . u...‘,“ g ‘9 .29. .g .L. ‘ 7“ ‘3. a ,4 . In; (1.11 ‘ L‘. ~. v 4.5 > ‘u;«' f’ A 1“} n'w .‘ C '1“ " ;., ”‘9 1‘ , H Z’-“'Nr. : a y. I): (mainly the execution times were rather high for some real-time applications). The algorithms like template matching, applying an eigen-feature approach, or match- ing using geometrical constraints, could be applied on typical still images. More application-specific approaches are based on detecting infrared reflections of the eyes, or detecting blinking in video stream, or in modeling mouth color in video stream data. Yuille et al. [99] find eyes and nose in a face image using deformable templates. They defined an eye template as a combination of a dark circle for iris, two parabolic sections of the eyelids for the boundary, and white regions of eyes. The preprocessing stage of their system includes morphological filters to find peaks and valleys in the image. They then deform the templates to minimize the energy function. Pentland et al. [62] proposed a method for face recognition based on finding facial features and comparing them to the database. They defined eigeneyes, eigennose and eigenmouth and were able to successfully locate eyes, nose and mouth using template matching with eigen-features. The locations of eyes were easily found, while nose and mouth locations had higher variation. Overall, the erroneous matches were defined as ones more than 5 pixels away from the true location, and the recognition rate was 94% with 6% false alarms rate. Talmi and Liu [79] used a PCA-based approach to find eyes. They created eigeneyes, and did subwindow search for the eyes. Additionally, they used stereo matching to find the precise eye position in 3D space, so that they could focus a Camera to zoom in on the eye. Kwon and da Vitoria Lobo [50] use snakelets to locate eyes, nose and mouth. 20 . v . \{w an 11 ‘ . 01‘ I? a [pH 0! i‘.. gt "'3 ffl'flgt‘fl 5' [31.) an ere-refit- l. , . 12w 71:? al. ll' :6. I my ~13 7" u RAH}: y. 'hr’ I Vv A .J. ' c ‘4‘ . ... I all‘ . 'A u 7" ._;-« Ii. .1 \ ‘an “ “ . | .‘ " - " V~§ wuihkfi (1 ‘I‘i‘l a ['li, . . l'x' V ‘ , "lflf‘l ”I :2] ;-.n . ‘ ’V , 4..“ Lack 1'? 7. “* LA. Y, I ] v. P-n. V‘ . ‘1"-U.(L 'T‘v h, ~-c~l.:1|‘ ~, .,‘A F \ . ‘ I‘VE." , , J“; a f‘ Al l 1 7;»- . V“! -.?‘v‘ “an: Xi.» ‘ a. 31:1 .filql .,V c ‘o ,- , :4 ,‘P ‘ [Au-“'1 8' n .‘l “ A WI “‘6. [3.1 ‘ .‘,.‘[_J. g .. ‘ n I \" - )3, :‘I § 1 ‘7' t 'v ,‘ . l H“: r.‘, '1 ’L. 8.; K \ 5‘ \ . ‘4. '11‘4. , \ ‘E O? f x‘» I» Stiefelhagen et al. [75] use geometrical constraints to locate eyes, nostrils and mouth corners. Starting from a region that is likely to be a face, they separate it. into an eye-region and a mouth—region. Assuming a near-frontal view of the face, they use an iterative thresholding in the eye region to find eye pupils; horizontal integral projection in the mouth region to find mouth location; and finally, iterative thresholding for nostrils. Bala et al. [9] find the eyes in a video stream based on blinking. Humans in- voluntarily blink to moisten the eyes, and the authors analyze luminance differences of successive images, which indicate motion. The eyes are located by tracking a luminance-adapted block matching technique. Oliver et al. [59] model mouth color similarly to the skin color modeling, and find and track the mouth based on a classification into mouth color and connectivity. Jebara et al. [42] find eyes in a face image by computing dark symmetry transfor- mation, computed from a phase and edge map by wave propagation. Mouth detection is done from a dark axial symmetry map, where the longest limb is selected as mouth. A coarse estimate of nose’s vertical location is done by searching for the strongest vertical gradient in an intensity image bounded by the eyes and mouth. Ballard and Stockman [10] used two methods to find eyes and nose. The first method uses deformable templates for the eye, similarly to ones described in [99]. To Speed Up the computation, simple ellipsoidal templates are used. The second method finds pupils and nose from light reflection. In their experiment, light is projected toWards the user’s face, and pupils can be easily found by finding bright reflections 21 I 11' . V ‘ .«vyt '[lnNinjltli‘L‘ 23.2: ‘1‘ ‘ v ‘ ' ,[, I. 1‘"; “8.531 Inf in" ‘a‘ J Llwi'il't‘2ll fl '1' I J A u 2.3 Facial ] - . _ .‘ ‘ £.>‘M'"’[V ‘, C v I I‘ ~~~ .uuulu. .61 ."1 \v 1 “'-W' my. I \ a, , ,7 \M» A I1 1 x” A soA "é?" '4' oil] . h" iv: [‘1 ~‘~¢n1 " IA‘I' . ‘. ' t « ' ....»-. .rrrzi. 11.1 _ "a ' I” l.‘ 1 ~~m...'3h Ir: 1' A ug‘ ' u' H? . 3‘ NC him- ,.-' J“ “;A . ls b ‘ h I.» ‘jc‘\’4r‘ ()7. l -4 - AA k,‘ t . 'p ”-51 8 . A .1 .;r ,. '. ' ‘ L‘ “Ii V f v., I. _ “.v . ”'“d (1"? \ AL ‘ (1" using thresholding. A problem with their system was that the lighting conditions were harsh for the user. Hutchinson et al. [40] and Morimoto et al. [54] used an infrared light source to illuminate the eye. Using a simple thresholding, in images obtained from an IR camera they can easily find a pupil and light reflection off the cornea and retina. 2.3 Facial Expression Discrimination Determining facial expressions could be done either from a still image, or from video stream. A number of different algorithms have been proposed for both cases. In the case of still images, algorithms like shape analysis, PCA-based classification of salient regions, and neural networks-based classification are used. In the case of video stream analysis, motion vectors are computed, and expressions are determined based on them. In fact, in the later case, typically the change in facial expression is determined (e.g., change from neutral to smile), while in the former case, the static image is analyzed and classified into an expression category. Yacoob and Davis [92] use optical flow computation to identify the direction of rigid and non-rigid motion of the face muscles when changing expressions. Their mOdel is based on linguistic and psychological considerations. They distinguish six expressions: disgust, sadness, happiness, fear, anger and surprise. The success rate raI‘lged from 80% for sadness to 94% for surprise. Kimura and Yachida [47] used the PCA analysis to classify face expression and its degree. Potential nets were calculated from the motion vectors of the face and four 22 1 “W“ ‘ KN “Mr W 'l,.mt]ssl- ‘ p :r: dfiffr u! i . , . I 2} cm: ELM c v . 4-~-i-.. , . . lJJ‘."Jm(‘Ai ‘K 1.3' \\ I‘.‘ .J i... . I ' .Y .. . ,vy v .2}. “41. ul: .i..:.. 2 1.. .41 . \ ~- 4h... r . i ' .r‘v. ”'7'?!) ‘ ' . . , T ‘kfllLertnf-f‘ It. . un’:" ‘ , u ”3.12“” d '1 w - I. “ . ‘0‘ * . "our‘l l ,, L]. a ‘ 4‘. ~. 44C“ ‘V‘G‘T-nl 1;, . {1‘1“ "rd ‘ ',“ ‘ ANA ‘ ‘ ‘Al. .lr’v-~.f ‘ ‘\‘.V\’ .w 1 ‘>._\. . ‘ ‘ [-. t \a u. d. (2.” .u..A JD “‘Q\ "(1" Y‘.‘. ‘ V ‘ 4 I “- f 1'» G. "r ‘u r l. ‘ ’ u .4 b“(- I‘ II .1 YV‘P' ' 7. cl“. “GLHZ r. :‘o. 5 v... . . ‘47: I’;~,, ‘ ‘ “HLQ‘J PL .1 ‘ ‘F‘bl :5 ‘v ' J. W“‘[ ‘ Ix .. " in (If, ”g ”l. ‘v. ‘ AJI' ‘ . e. T t .it‘ “‘1‘ ’4 .5- .”fi'k, , ., I .‘_ ‘ M. . ’Qgi‘Jr r expressions were determined: neutral, anger, happiness and surprise. They showed that the degree of an expression can be estimated as the person changes from neutral to the other three expressions. An evaluation of their system was done on a single individual who was used for training also. The performance of the system was not as good when an unknown test person used it. Oliver et al. [59] used mouth shape to discriminate four expressions: neutral, open mouth, smile, and open mouth smile. Their primary feature is the mouth shape that is characterized by the XY eigenvalues of the mouth region and the XY position of four extrema points. They used these features in a Hidden Markov Model (Hh‘lM) for final classification, and achieved 96% accuracy. Bakié and Miller [7] used an artificial neural network to discriminate the same expressions as Oliver. The overall accuracy achieved was 84%. The open mouth expression was easier for discrimination than neutral, smile and open smile, and the network had the most confusion between the latter two expressions. Colmenarez et al. [21] proposed a classification based on several features. Six expressions were considered: neutral, sadness, happiness, surprise, disgust and anger. They define regions for facial features and use the position of extreme points (e.g., eye or mouth corners) within the regions for classification, as well as the images of regions. The features they use are the end points of the eyebrows, eye and mouth Corners and the tip of the nose. For each user they design a separate facial expression Classification based on the PCA, and using their face recognition algorithm, the user’s model is retrieved and used. Using the feature points only, the error rates are 19—46%, and using both the feature points and feature images, the error rates are 6—20%. 23 .’ “AW—’ ‘41 W”... ' a. '1' . ‘4'. f 1".0 . "w ‘ A ‘, v [‘ylixv‘tl‘ir‘ v ?~ 5 . T.“ D. . .L“ .- ‘rI “Ad'c. (. ‘J 1" «1,. _, ‘N N v L ‘d I ' u ‘3‘}; . 1‘ J [15" W Image Plane Coordinate systems and perspective transformation and three feature points on face. Figure 2.1: The illustration of the system described in Ohmura et al. [58] 2.4 Gaze Direction Detection In this Section, several methods for gaze estimation are presented: based on mathe- matical equations, based on the detection of pupil reflections, based on limbus track- ing, based on teaching an artificial neural network to map eye image to gaze direction, and based on electro-oculography. 2.4.1 Mathematical Solution In 1988 Ohmura et al. [58] presented an algorithm for calculation of gaze direction from the image coordinates of three points, distances of the three points on the face, and camera focal length. The gaze was defined as perpendicular to the plane that Passes through the three face points. They used three cosmetic marks near eyes 24 ' .. I]. i If“ 'J‘fy), ‘\ till -I at - IIv to l“... o 0‘ ‘ h ‘5‘ [AI 1 .A] .. i i I'll iv ‘ i a him.“ A uu'A u ‘ u . , 9' s ,l' v v ‘ - W" . p] .u.‘ l1 ,A- A u i #5 "~ 1 y ‘ .fq— t ' V » ( If” A \ nl , . l ' v Q a lio" o “Luna v]; .H 'r:. ‘L ' l I '-.L F i" - ‘ .G., ...; " D V . ' ‘1 V H ‘ ,. Y ‘ ”-4; ‘ . b. i" ' 11 V v , . , ‘L' d[]u [ .‘V'v ‘ I‘ I ‘tl ‘11 'l‘ I" \N‘. 5,.‘4 , u“ “in .1 J ,L‘d .j‘ '- I 4', .; 1.. N 41" F. ' ‘_4\3’ “4',; and nose, which were tracked in real-time using specialized hardware. Figure 2.4.1 illustrates their system. Ballard and Stockman ’[10] used the algorithm from [58] to calculate user’s gaze. They applied their results to the HCI task of menu selection. Gee and Cipolla [31] presented a solution based on the knowledge of relative distances of facial features, and assumption that the imaging process is weak per- spective [67]. Using simple calculations based on the geometry of the face and imag- ing process, and knowing the distances between eyes, between eye line and mouth, nose and mouth, and nose depth, they can estimate the face normal. Two methods were proposed—one based on the 3D geometry, and another based on the skew— symmetry [55] of the facial plane. They calculated the accuracy of their model using a generated face model, which was rotated in all possible directions. They showed that the calculated normal is within 6" of the true facial normal. Stiefelhagen et al. [75] posed the gaze determination problem as the pose esti- mation problem, and used iterative POSIT algorithm developed by DeMenthon and Davis [24]. In addition to head-pose estimation, they estimate eye-gaze using an Artificial Neural Network in the same manner as described in [11]. 2.4.2 Pupil Reflections Hlltchinson et al. [40] designed an eye-controlled HCI for disabled persons. They illuminated the user’s eye with infrared light, and the gaze was inferred from the rElative position of the bright-eye (the reflection off the retina) and the glint (the 25 —’ T‘ lrfiD' '1‘. T NU ---i ; K“ r. i“... i. ’. x‘ :.V .‘qu , ’ ‘f'l-‘u II . __.‘ Cornea —————————-————————-—_—___—_.—_—._— Purkinje images: lst 2nd 3rd 4th Figure 2.2: The four Purkinje images are formed as reflections of incoming light rays on the boundaries of the lens and cornea reflection from the cornea, the first Purkinje image). Figure 2.2 illustrates the four Purkinje images. The system needs to be calibrated before each session. For the calibration to be valid, the user’s head should be still throughout the session. In [48] they extended the system to allow some head movements. Talmi and Liu [79] obtained a. zoomed eye image by focusing a camera to the cur- rent eye position in 3D space. They found the pupil center and the bright reflection from a light source, and used their relative position to determine user’s gaze. The sys- tem needs to be calibrated before each session, and is sensitive to head motion. Using Stereo eye tracking, they were able to compensate for some head motion. However, the time lag needed to re-calibrate and re-focus the eye camera upon head motion Was one second, which may be problematic in a real-time application. 26 y- .4 ~, ’p. p». 1‘ ”\f ‘ . u ‘l . h. J I u n 3 mg ‘n_ ‘u _r ‘A \ 2.4.3 Limbus Tracking The limbus tracking method tracks the boundary between the dark iris and white sclera (the white of the eye). By measuring the proportion of sclera left and right of the iris, we could determine horizontal eye motion and transform that into gaze. This method, however, is not suitable for vertical motion detection. Similar to the systems described in Section 2.4.2, this system would require calibration before each session. Applied Science Laboratories [1] has developed a commercial eye-tracker based on this method, that requires the user to wear glasses with a tracking equipment. 2.4.4 Artificial Neural Network Estimation Baluja and Pomerleau [11] trained an Artificial Neural Network to estimate gaze direction. The input to the ANN was an image of an eye, and the output was a grid of 50x50 units organized as X and Y coordinates of the gaze. The network was trained to have Gaussian output representation, similarly to what was used in the ALVIN N System [63]. For each user a separate ANN has been trained. Achieved accuracy was 1.50. However, it is not clear whether significant head motion was allowed, and what the testing procedure was. 2.4.5 Electro-Oculography Kaufman et al. [44] presented a gaze tracking system based on electro—oculography. They used neurophysiological methods for eye movement measurement, which place Electrodes close to the eyes, and measure the potentials. These signals could then be 27 r O y\ 1 ' ["1 t on. fln.‘4 .2. . "' Iv- n In ...‘..;3. ii. I “’1‘ 34.3.3. . V- ‘ 'r ' ‘4' l d - ' . . ‘9‘.a..\..‘ -r“ .‘u -~.' a I I <4 . ‘-. {V \ ]‘h ‘Iqu-‘ ~. ...w. ‘ \ a. r “I" ‘1 L, [RV ‘.}"t I“. “A. l‘h‘ ‘ '."Al;i used as an indication of user’s gaze. Their system needs to be calibrated before each session, and head movement is not tolerated by the system. 2.5 Kalman Filter The problem of parameter estimation can easily be solved if we know all the past data points. In many practical applications, saving all the data points is not feasible. In 1960, Kalman [43] pr0posed a recursive solution to the discrete—data linear filtering problem. The Kalman filter is a set of mathematical equations that provide an efficient computational (recursive) solution of the least-squares method. An introduction to the Kalman filter is available in [89]. In this Section, just a brief description of the Kalman filter equations is presented. The goal of the process is to estimate the state :1: 6 ER" of a discrete-time controlled process. The variable a: to be estimated is described by the linear stochastic difference equation: xk+1 = Akxk + Bku,c + wk. The measurement 2 E W" is used to estimate the value of x: 2,, = Hkx]c + vk. The matrix A relates the state x at time step It to the state at step k + 1, Matrix B relates the control input 11 E 32’ to the state x, matrix H relates the state x to the measurement 2. Random variables w and v represent the process and measurement HOise, respectively. They are assumed to be independent of each other, white and With normal probability distribution. 28 11“: ’7‘ 'i'lV‘ "4 'I ..iu.. ... '. ' n - Lord, ,1 .4.— C The estimation algorithm has two steps: 0 Measurement Update, or “Correct” step, updates the values of Kalman gain matrix K, estimate x with measurement 2, and error covariance matrix P. 0 Time Update, or “Predict” step, projects ahead the state x and error covariance matrix P. The steps are combined as follows: 1. Initial estimation of x and P; 2. Take measurement 2 from the world, and perform Measurement Update step; 3. Perform Time Update step and use the state estimate x as needed; 4. Go to step 2. The equations for the update steps are as follows: Time update: X,:_H = Aka + Bkuk 2+1 = AkPkAf + Qk Measurement update: 2k : i; + K,,(z,c — Hkxg) Kk = P;H{(HkP;Hf + erl Pk : (I — Kka)P; With the following definitions: XE: xk Pita Pk Kk R]: Q]: 2;; - Hki; a priori, a posteriori state estimate a priori, a posteriori error covariance estimate Kalman gain or blending factor, minimize the a posteriori error covariance measurement error covariance matrix process noise measurement innovation or the residual 29 v'; || CRT ‘ LCD | ELD resolution 1280 x 1024 up to 70 addressable or 1024 x 800 800 x 1000 dots per inch size 15 to 21 inches up to 14 inches 6 by 8 inches up to diameter diameter 12 by 16 inches Table 2.1: Resolution and size characteristics of common display devices 2.6 HCI-Related Issues This Section discusses some Human-Computer Interaction (HCI) issues that are used to evaluate input devices. First, basic input device terminology and mental models in the HCI are presented. Fitts’ law, used to express the time to move the pointer from one location to another, is presented. A study of eye tracker as an input de- vice is presented, and, finally, several existing eye- and head-controlled interfaces are presented. 2.6.1 Basic 2D Input Devices In this section, basics of the input device terminology and interaction tasks are pre- sented. For details, see [35, 28]. The most common display technologies used nowadays are Cathode-Ray Tube (CRT) displays, Liquid-Crystal Displays (LCD) and electroluminescent displays (ELD). Table 2.1 shows basic resolution and size characteristics of the above three input devices. Conventional input devices that are used in conjunction with the above display deVices are touch screen, light pens, graphic tablets, mouse, trackballs, and joysticks. 30 Or“ 9‘! it to. - V n y ‘I\ l" - u‘i . {Y ,Vl'i W‘ 1““: -‘ .4... “If“ 7,. «II'J ‘At . I71. I 1"» Q)!" v A . r: \ U" ‘» ". i. ‘< Design details are described very briefly for each of them, and the emphasis is on the resolution and usability of each device. Touch Screen produces an input signal as a response to a touch or movement. of the finger on the display. Touch screens can be based on several technologies: conductive, capacitive, cross-wire, acoustic or infrared. The resolution can range from 25 x 40 for infrared, through 256 x 256 for capacitive, to 1000 x 1000 to 4000 x 4000 for conductive discrete touch points. Studies showed that cross- wire technology had the best tradeoff between display resolution, touch screen resolution and user preference. The issue with touch screens is that the user needs to press the target area, and a user’s fingers are not too good for high- resolution target selection. Thus, typical selection area sizes are in centimeters (e.g., 1 x 2cm with spacing of 1cm). The advantage of touch screen device is that the input device is output device at the same time, thus the hand-eye coordination is natural. The disadvantage of the device is that the resolution is limited by the finger size, regardless of possible device resolution, thus making it unsuitable for small object picking. This can be somewhat resolved by using a stylus, however, the hand movements are not natural in that case, and the hand may obscure the diSplay. Another disadvantage is the parallax problem: the display and the touch-sensitive screen are separated, thus the user places the finger slightly above the target on the display. This can be resolved by asking user to place fingers perpendicular to the screen, however, then the naturalness of the device is decreased. 31 llglll P911 1: par M iii film I p.115: 1' 'h“ 0 o \L‘ Lu'l Light Pen is a stylus that generates position information when pointed to the dis— play screen. The pen senses the electron beam of the CRT and based on timing of the refresh signal, the position of the pen can be calculated. The highest possible resolution obtained is 1 / 4 of a pixel on a 1000 x 1000 display. Similar to the touch screens, the usability of the light pen is more dependent on the user characteristics and pointing capabilities than on the technology itself. Graphic Tablets have a flat panel placed on a table near the display. N’Iovement of a finger or a stylus provides position information. The tablets are based on a matrix-encode, voltage-gradient, acoustic, electro-acoustic or touch-sensitive technology. The cursor control can be absolute (cursor position on the tablet is returned), or relative (the cursor movement and previous cursor position are used to calculate the new cursor position). Another issue is what the dis- play/ control gain level is. One study showed that for a 12.5in display, a gain of 0.8 to 1.0 worked the best. Tablets have a resolution problem similar to that of the touch screens, and while the display is not obscured by the hand movement and there is no parallax problem, hand-eye coordination is needed. Mouse is a small hand-held box that measures its movement on the pad surface, and the movement is translated into the graphical cursor movement. Mice have one to three buttons that are used for various selection tasks. Mouse movement can be detected mechanically or optically. In the former case, no special pad is required, while in the later case, a special pad must be used and mouse movements are restricted to it. Similar to graphic tablets, display/ control 32 , ., "r .. . Blvd ‘1. h: ,1 LJA 1' VI , r.” . ‘i l," 1 L. Li UL 11.; ‘§A , I i '7 ' ~ . l . .L‘ . . L L'L“ .gh T». . ‘4‘” Ar, \A._ "4 \ ",., M. «1.~ ; 7.5-Y _ .‘ , a. (1“l‘ _i 5%”. i', v- ‘ . “ '1 . 1“" .9- gain can be adjusted to translate mouse movements into display movements. The gain can be larger for rapid movements than for slower movements, thus enabling fast and accurate positioning. The mouse can provide only relative mode and hand-eye coordination is required. Up to a display pixel accuracy can be achieved with the mouse, and it can be easily used to point to small targets. However, some difficulty might arise in selecting them since the selection button is on the mouse and pressing it might cause mouse movement. Trackball is a ball in a fixed housing that can be rotated in any direction. The rotation is translated into cursor movement. The underlying technology is based on optical or shaft encoders. Similar to the tablets and mice, display/control gain must be specified and the trackball works in relative mode only. Joystick is a lever mounted vertically in a fixed base. Potentiometers sense the movement of the lever and the output signal can be transformed into cursor movements. A force or isometric joystic has a rigid lever, and the amount of force applied results in cursor movements. Similar to the mice, the display/control gain factor needs to be appropriately adjusted. The above devices are used for various tasks based on their specifications. Touch screens are best suited for menu selection tasks for data already displayed on the screen and are widely used in information kiosks. Light pens are used for menu selection and locating and moving symbols on the display. The graphic tablets are best used for drawing purposes and can be used to select menu items. The mice and trackballs are best suited for pointing and selection tasks. They could be used for 33 . I ,.... la _v ' 17:11.31. lw-l’ 1'v'tvnv n 'Y Hauuj I» 5‘ 3 It" '1 I . : .v I '. ' ‘H‘ tr. (1"! . All: v . A . U.r ‘ kl. .7“, 7r “ OBI Il '4. "v- L!- K . d",1‘r i.,L gr ‘1 if. bv—u] D‘ 'V‘ v drawing, however, the tablets are more useful in such cases. The joystics are used for continuous tracking task and pointing that does not require high precision. The above devices are all used as locator devices, and they can be absolute or relative, direct or indirect, and discrete or continuous. 0 Absolute devices have a frame of reference, and the origin and all cursor positions are calculated with respect to the reference frame. Relative devices report only the movement of the device, and the display position changes with respect to the previous position. Thus, with the relative devices, the user can specify an arbitrarily large change in position, while with the absolute devices, the range of the position change is restricted. Another advantage of relative devices is that the cursor could be positioned anywhere on the screen from an application program. The touch screen and light pen are absolute devices. Graphic tablets can be configured as both absolute and relative device. Mouse, trackball and joystick are all relative devices. 0 With direct devices, the user points directly with the finger or stylus, while with indirect devices the user moves a device that is not on the screen, which results in screen cursor motion. For indirect devices, hand-eye coordination is needed, and that typically requires some time to learn. The touch screen and light pen are direct devices. The graphic tablets, mouse, trackball and joystick are all indirect devices. 34 .-o- - HUI l- 4. ,.;. cur». if : l ' i 0‘ l~ 1.? .. g o- . ..o.~ Till-i lii u , " A “an.“ -' v. '5‘.»&I I. -' c (if . "w 5’" ii a 1 TL». TI’K‘ Brim “d l". 4‘ 5N. . j Cl! 1‘" l I . , .' {H'Dx‘rq 0 Continuous devices generate smooth cursor motion as a result of smooth hand motion, while discrete devices like cursor-control keys do not provide smooth cursor movement. In case of continuous devices, the speed of cursor positioning is defined by the control-to-display ratio, or C/ D ratio. The ratio defines the scaling of hand movement change to screen cursor movement change. It can be configured such that for rapid motion the ratio is large while for slow motion the ratio is small, thus allowing accurate positioning. Examples of continuous devices are the graphic tablets, joystick and mouse. 2.6.2 Interaction Tasks The two basic interaction tasks are positioning and selection tasks. A positioning task is specifying an (r, y) position to the application program. The interaction technique involved is moving a display cursor to the desired location and pushing a button. The selection task is choosing an element from a choice set (commands, attribute values, object classes or object instances). This task can be performed by pointing at a visual representation of a set element or pressing a function key for a set element. Often used interaction tasks for comparison of input devices are target acquisition, menu and text selection, text entering and editing, and continuous tracking. The target acquisition involves positioning of a cursor at or inside the target position and pressing a button to indicate selection. Experiments often have varying target area (e.g., 0.13 to 2.14 cm2), and target distance from the initial cursor position (e.g., 2, 4, 6, 8, and 16 cm). The menu selection task involves selection of a target menu 35 16123.“. a t: ‘ ‘ 'h»'1>4 'r;lf\ IL‘A.AJ| .3.» t l V . SCI ; 'v.‘ . .. H “' “"JL‘? l") ”A . “ ' \ v , . d. C“. . 1». ‘ 4: H 1" U i.‘ l.‘ \W" i" ‘ "‘-.‘_‘ .1. ‘ I’v‘r . . k. '1-_ t r‘ .k. V l' ‘1- kl ‘ 64 A‘- '1‘. ‘ {Tam : -. item in a variable length menu. Varying items are the number of menu choices and their spatial and logical organization. The text entering and editing task is typically done via keyboard, but other forms of input have been explored (e.g., using light pen, mouse, etc.) The continuous tracking task consists of the user following a cursor position with an input device. 2.6.3 Mental Models in Human-Computer Interaction What does a user know about a certain HCI system? How can a developer model user’s knowledge? These are very important questions from the standpoint of a developer, and it is very hard to give exact answers. If a software developer can model a user’s reasoning about the system, the system’s usability and efficiency will be better. From the user’s point of view, learning to use the system would be easier. In this Section, Carroll and Reitman Olson’s [18] Chapter in Hellander’s Handbook of HCI is reviewed, and basic terms and models from it are introduced. What the user knows about a software system includes (i) rules that prescribe actions to be applied (simple sequences), (ii) general methods that fit general situ- ations and (iii) a “mental model”—user’s perception of how the system works and how to use it (this should include the knowledge about system components, their interconnections, and ability to construct reasonable actions and explanations why the actions are appropriate). In the simple sequences representation, the user rote memorizes the sequences needed for certain tasks. For example, the user would memorize a command to be 36 o i, ‘ .351 [1r \ Vir‘ . L... "l 1 . '_ ' "m‘ Li .‘ W \ ‘ v ., ,‘7‘ .. lll|H 7 - .h , a, .1 - _. 4' "‘"-Vi '"LA. . n.. .\ .‘\ I r y“ i.‘ M. '«., .il' Mu“ “ \“ ‘ ‘Hil ‘b ‘ r "A .,T v .I,'\[‘ A ~‘-‘ H, f“v. A ‘k‘: l L' 3 u \ , <‘ ”I J a ‘l‘ l, . . .‘, .h \v.l‘ F. ~ '1, C V c p 5 kn ,. typed, and might have no knowledge about the underlying system or general rules that can be applied. At the second level of representation, we can model the knowledge of methods that can be used. This is often done by modeling tasks and goals to be achieved. Card, Moran and Newell [16] proposed the GOMS model, which stands for Goals, Operators, Methods, and Selection rules. In this model, the user recognizes that a primary goal can be broken into set of subgoals, which can be further broken into subgoals... until a subgoal matches a method in the system. The user has some rules by which subgoals are created and methods chosen. A number of empirical studies showed that this model can be applied to a variety of tasks, sometimes without even changing time parameters. Another model of user’s knowledge of methods is based on command grammars: Command Language Grammar (CLG) [53], Backus Naur Form (BNF) [65, 66], and Task Action Grammars [61]. These grammars are sets of rules that can be used to perform actions in a system, and “sentences” that are acceptable in the grammar represent correct actions that the user can apply. The grammar rules have similar structure as GOMS subgoals, but show more compactly alternative ways to accom- plish a task. Finally, we can model the most complex level—“mental model”. There are several kinds of models: surrogates, metaphors, glass box machines and network models. Surrogates [98] is a conceptual analysis that mimics perfectly the target systems’s input and output. It is not assumed that the surrogate and the real system produce the output in the same way, thus real causal basis of input-output cannot be provided. 37 4 (l, .l u. ’7 "1: o L. 41,, p c. F1r‘ "‘ l‘“ ‘- sa-. 9 a. F m . ‘ a. J. _ l :n ' Y" ‘ "I? “M" .i. n. V .9 . 4 ' ‘ .u, .. H . .v[ [7‘ . ‘ n, \v- ”7"?! .,‘ i, A . v . ' ‘ ‘ .. U, ‘ u ’s T a lv,V’ ‘ f 'P u ‘4”;— fi i ”main 1 i ' M. ‘l , 7-. ,‘ , \ 4" 4. , ‘W l J k§ "r ”-1 ‘7» . u. ““¢ 1" ~\ h u, ' ‘Ak~ 4‘ n A‘. ‘1 N K r . ‘ . “V" l.- ..‘\ ‘-. \> S a 1- x 5.,“ , . 1‘ 1 \' '. ‘~." r. a“ . i. A metaphor model [19] compares directly the target system with a system already known to the user (e.g., text editor and typewriter). The user easily learns known functions, however, new functions are harder to comprehend and are a constant source of errors. Glass box models [25] are a mixture of surrogates and metaphors. They mimic the target system, and provide some semantic explanations for the internal components. They are more used in a prescriptive context rather than in a descriptive 0118. Network representations of the system have states the system can be in, and the actions that the user can take to change states [51]. Kieras and Polson [45] “Generalized Transition Network” (GTN) contains detailed description of what the system does. The states of the network represent visible states of the system (e.g., screen display), and the arcs represent commands and menu choices that can be taken from each state. In simple terms, GTN represents what can be expected of the system when the user takes some action, and that knowledge can be useful to the user while learning and recovering from errors. 2.6.4 Fitts’ Law How fast can a human position a cursor from point A to point B on a computer display? The time needed can be estimated by using Fitts’ law equation [16]. The model is based on a Model Human Processor. It consists of three processors: (2') perceptual system—sensors and associated buffer memories such as visual and auditory image storage; (ii) cognitive system—based on sensory information from 38 .w - ‘1 .. r i ; l- ...“ -.-) v—r.’ l .. ‘u. , . s“. w h "« Lui‘ v." . V ' .Que p, , 1' ‘v C“. ‘ I“ v A , u, l l. 5‘.‘ Av, 6',‘ g; ‘l. ‘. 7 -. ‘ I v 5'. ‘ " .‘gbc y'- NA 4 _ M n; t. e. .. ‘ \ < ------- > A = x0 X1 X7- B l l . TARGET l l 3mm D -< —————————————————————————————————— -> Figure 2.3: Fitts’ law: analysis of the movement of user’s hand (from [16], page 52) working memory and previously stored information from long term memory, a. decision is made on how to respond; ( iii ) motor system—carries out the response. The task of moving a pointer from point A to point B is illustrated in Figure 2.3. The points are D units apart, and the goal is to move the pointer within S/ 2 units from the target. In each cycle, the user’s perceptual processor observes the hand (time needed rp), cognitive processor decides on corrections (time TC), and motor processor performs the correction (time TM). The number of such cycles is 77., thus the total time needed is T,,, = n(7'p + TC + TM) T100, 2 1M log2(%) — I J . . I where I” — —L————ZTP+TC+T" [1n msec/bit] and 6 IS a constant. 1 loge ’ For low values of log2(%), the equation does not fit the data well, and Welford [90] proposed a correction that better fits the data: Tpos 2 1M10g2(% + %) The value of 1M is calculated empirically, and ranges [M = 27 ~ 122 msec/bit, and is usually about I M = 100 msec / bit. 39 Fitting data into Fitts’ law equation provides an easy comparison between dif- ferent input devices’ performance on the same task. For example, the data for the conventional mouse and for an eye-mouse could be compared, as will be discussed in the following Section. 2.6.5 Evaluation of Eye-Tracker as Input Device Ware and Mikaelian [86] studied eye tracking data and eye-movement usability as a computer input device. They used eye-tracking equipment based on infrared corneal reflection, and the system required the user’s head to be still during the experiment. In the first experiment, the user’s task was to select one of seven vertically arranged buttons on the screen. To indicate the location that the eye-tracker “thinks” the user is looking at, they showed the current cursor location, as well as outlined the button that was fixated. Three selection mechanisms were tested: 1) dwell time button—the object that is fixated for more than some interval gets selected; 2) screen button—a large rectangular area on the screen is set aside for a selection button, and when the user fixates the selection button, the object that was last looked at is selected; 3) hardware button—the user presses a physical button while fixating the item to be selected. Each trial proceeded as follows: the user would fixate the central screen point, and initiate a trial, then the target button was indicated and the user would try to select it. The selection times were all below 1 second, and the dwell button was the fastest selection mechanism, followed by the hardware and screen buttons. The 40 errors in the selection were 12%, 22% and 8.5% for the dwell, screen and hardware button, respectively. In the second experiment, the buttons were arranged into a 4 x 4 square matrix, and the user was driven to selection of targets by the computer. The sizes of screen buttons varied from 48mm to 7.2mm (at. the viewing distance of 90cm, the sizes were 30—0450), and the dwell time was set to 0.4 seconds. For smaller button sizes (045" and 0.750), selection times were significantly higher, while for 1.50—3" sizes, the selection time was at or below 1 second. The hardware button was faster for the selection. In terms of errors made, for smaller buttons, the errors were 20—50%, and for larger buttons, the errors were below 10%. 2.6.6 Eye-Controlled Systems Hutchinson et al. [40] designed an eye-controlled HCI for disabled persons—the quadriplegic population who retain some motor control of their eyes. The system is named Erica for eye-gaze-response interface computer aid. As mentioned in Sec- tions 2.2 and 2.4.2, they use an infrared light source to locate glint and bright eye, and based on their relative position and calibration they can determine the user’s gaze. The user’s head must remain stationary for the system to work properly. In their system, there is a 3 x 3 matrix with 9 menu buttons. To select an option, the user needs to stare at its menu box, and after some fixed time interval (2 or 3 seconds, but it can be altered) a tone sounds and the cursor appears in the menu box. If the user continues to stare at the same box, the tone will sound again and that option will 41 .I","~ iIY .1 an“ .AA- , Q lg.“ ‘ .J‘. a . .... . {1r 7 ,v ' .,l (.1 O [m \ ‘ A up] --. Le“ . 0" ‘ ti. ‘HY‘ T'L‘. D», ,. .1“ K [LC l.‘ as" " u. . be selected. A number of applications are available for Erica: control of appliances, communications programs such as word-processor and voice synthesizer, computer games, and text reading. In the case of the word processing program, only the upper two rows are used for control and the lower row is used for text display. To enter one page of text, it can take 85 minutes for an experienced user. In the case of the text reading program, the upper two rows are used for display and the lower row is used for controls. Frey et al. [30] improved Erica’s original word processor. They used a menu- tree to organize letters based on their frequency, and re-organized the menu options (letters) based on the probability of the next letter in a word. The average time to pick a character is 1 second, which leads to 80 minutes per page. White et al. [48] further improved the system by enabling spatially dynamic calibration, thus allowing for some head movement during the session. Buquet et al. [14] installed a slide viewer in a museum in Paris, that was based on eye-tracking and gaze as input. Their results were quite promising (on 153 test persons, the success rate was 83%). However, as noted in [32], the participants were not described, which leads to a conclusion that they were a highly motivated chosen target group. In a museum in Denmark, an eye-controlled exhibition system has been installed. Glenstrup and Eugell-Nielsen [32] attended an exhibition and tested the system exten- sively. The system, called EyeCatcher, introduces a new on-screen button—EyeCon. The button changes from open eye to closed eye to indicate to the user that it has been selected. The selection is made by fixating the button for a time period. The 42 \ «339:: new v v ' ' vr v yyv[,y’ .- " ' “Add“; ii iii. 4 ,.. 1' '. v ‘ “a “tip 11“" T .t..d1..::_ 1“. 9 "v ..-, , r- ,‘»‘. “Ad“: “‘43. 1 A, ' a. i', (pr. [V ? ‘ “A: “A“ . . .A ~ , , . .lJ‘ l‘ ; y , , . a; l.." q- V Q ‘r I ‘Q ‘1‘, - , Nl' 7?. J‘ LR. system needed a calibration phase, and offered text reading, looking at pictures and watching a film clip. The overall conclusion was that the interested users (e. g., adults) were able to use it, despite the boring calibration phase. Children were very impatient and were not able to use the system. Among other problems, they noted the need to keep the head still. Starker and Bolt [74] tried to design a non-command [57] eye-tracking based in- terface. The application is based on “The Little Prince” novel of Antoine de Saint Exupery. The user’s interest in objects is measured by the length of fixation and the interest levels “age” if they are not refreshed by fixations. The user will automatically obtain information about an object if the system “thinks” that user’s interest level for it is high. Jacob [41] conducted extensive research on eye-tracking based HCI. In his exper- iments, the Applied Science Laboratory [1] eye-tracking equipment based on infrared reflections was used. In such systems, the user’s head must remain stationary. Several improvements of the use of raw eye-tracking data were proposed. First, he discov- ered that calibration was not uniform across the screen, e.g., more imprecise at some locations. To correct this, the user could move the mouse pointer to the problematic area and stare it for a while while the system re-calibrates. Second, if the user fixates slightly off-target, the system accepts that as target fixation. To ensure that there is no mistake in the selection, the system will accept off-target fixation only if the fixated point is far away from other targets. Finally, the problem with raw eye-tracking data is that some noise could be present. The noise can come either from unconscious, jittery eye-movement, or blinking, or some system measurement error. To avoid the 43 . ,_ _ Usilét. data limi‘lll.v’.;‘ as: interval. llli‘ I: 4K), 1 7". a f 1\ din, Ib.‘ Aiifilt.)t1.1 ”x” ‘l L ' l liaufrzan it a; mini. llm IN- 1.,4 ;' .~ -. -. ' Y ewlldf) M “l .. 1'. l _ l ' LC Title 51* ‘1‘."5. l ‘ I‘ . . . 5RD \l?‘ 't ....i‘.mr ‘.. ‘ r!,‘ W! Y ‘1 RH'PIlill'. sewn" hr” I 4,“! l . “fl: ‘-lt~rluibi‘ll ill I 7.15:. A ’ “E-‘f’ are ”rail-i). (‘i .1 Ml. . e. .‘ .- \ "‘t ' Y‘all» .. «LQEL P n, .H I a: \\" - ' P Y .l~ t‘ r',: .’\ N .- . \ "fair ‘IJII. \. noise, data tokenization is used: instead of reporting all the tracking data, after 100 msec interval, the mean gaze position is reported. The resulting data is a string of tokens instead of stream of eye-tracker measurements. Kaufman et al. [44] designed an HCI based on eye-tracking using electro- oculography (described in Section 2.4.5). The selection mechanism is blinking or winking. They tested menu selection using 3 x 2 menu buttons, and the achieved accuracy was 73%. In the case of selecting corners only, the accuracy was 90%. LC Techologies, Inc. [4] offers commercial systems that use technology and offer programs similar to those of Hutchinson et al. [40]. Recently, several systems based on Artificial Neural Network gaze detection have been developed in research laboratories. The applications of such systems are not. commercial yet. Baluja and Pomerleau [11] reported that they could achieve 1.5 degree accuracy, compared to 0.75 degrees in the commercial trackers. However, they neither specify the methodology for evaluations, nor the number of subjects. Similarly, Stiefelhagen et al. [75] reported errors in rotation angle around :13, g, and z axis of 1—5 degrees. They use their system for panoramic image viewing. Additionally, they use their system [95] to locate head and perform lip reading, and to locate speakers so that the signal coming from a microphone can be enhanced. 2.6.7 Problems with Eye-Tracking Systems In this Section, several problems related to the state-of-the—art eye-tracking and gaze detection systems are discussed. 44 Non-IntrusiVen‘” «n.- w the" 11‘” 3 calf-{Hf [th (IV? lli .‘.. and. uim' ._ l resin?) ill till" 7h" *" T ,: - ll.» 0l».\r'l"5'(:l ‘ (r: 3“}; , l r‘ -' .a. m. Ev: Non-Intrusiveness: Jacob wrote in [41]: “The eye tracker is, strictly speaking, non-intrusive and does not. touch the user in any way. Our setting is almost identical to that for a user of a conventional office computer. Nevertheless, we find it difficult to ignore the eye tracker. It is noisy; the dimmed room lighting is unusual; the dull red light, while not annoying, is a constant reminder of the equipment; and, most significantly, the action of the servo-controlled mirror, which results in the red light following the slightest motion of user’s head gives one the eerie feeling of being watched.” This observation illustrates the non-intrusiveness problem related to most eye— trackers. Even though the equipment does not touch the user, the user is often required to be completely still. Additionally, some question the health of the use of infrared lighting. Calibration: In most eye—tracking systems, the calibration step is a must. The user is required to calibrate the system before each session, and this might result in an annoying situation if the system is to be widely used. Another problem with calibration is that if the user moves out of focus of the camera that is tracking, the system would need to be re—calibrated. Thus, systems that would not require calibration before use, and that would enable free head motion, would be truly non-intrusive and user-friendly ones. 45 llidas Touch: .- tlizrrn. tlu- l the 1m In Wil"l*“§r[ \ ‘> u lliilim Tl,’ UNIT f [Lt I‘ I‘ll r 14.. ' J ‘ ~'i1.44.(i“f 15:8 He; Midas Touch: The problem of the selection mechanism in eye-trackers is persistent. Often, the target is selected after long fixation. However, how do we know that the user really wants the target to be selected? The user might just stare at a point without the intention to select an item. Since we cannot “turn off” our eyes when we don’t want to control the system, we could enter a state in which wherever we look we activate some command, and the interface would have the Midas Touch problem. An eye-tracking system should be able to distinguish user’s intent to select an item, from the state of observance. In our aim to create a better HCI, we are faced with a need to enable the user to seamlessly activate a command without the need to perform some complicated and unnatural action. 2.6.8 Head-Controlled Systems Heuvelmans et al. [39] designed a handless interface for disabled persons based on head movements. The system has a head-borne mouse replacement unit based on transducers. Ultrasonic sound waves are emitted by a control unit, and are received by a head-mounted transducer that converts the sound information into electronic signals differing in phase, which are further used to calculate the position of the pointer on the screen. The selection signal is a “dwell button”, that is, the user fixates a point for a predefined period of time. The system was used by two disabled persons, who used a typewriting application and achieved a typing speed of 60 characters / minute. In their setup, the smallest button was 34 x 20 pixels or 11.1 x 6.5 mm and all the alphanumeric keys and space bar are displayed in the lower portion of the screen. 46 "i . II I liltiaiizsix. it: rich oi the w: I y “.- 861.5le I. .. ~ ,¢-. , {mal'l‘v‘l “.' in“. sun... \I‘II pew ‘, ’l w lini‘igi (1 ,l‘ .. ' I . ‘IJJ . IrLr‘bui ,d. Vi) [I I . I- A ”"’V'pv ~‘ i- I, “A' h [l“blv[.\“”_ ””1 {6“ d; [r W " I v ' . Brabli‘. i3 :‘11', .~, .' N" gradidl’i T “FTP“ r -: ALP)“ UL ii J 1“ £1,554 ,- _ ,_ H. _’n all h: 7;,«1. .. . ~. .jf[prn«,‘. .l .AA“‘I‘}~J[J "fix I L' ‘. itazior. Iii ”AU ' {III-L r1 Additionally, the screen has several control buttons arranged in a top row and in the middle of the screen is a text display area. Beardsley [12] designed a system for monitoring of drivers. The system is based on matching against synthetic images from a 3D model to determine an approximate head pose. In the initialization step, the 3D model of a head is projected onto the image and manually adjusted. The model is rotated and template images are generated, which are used for later comparison. To better discriminate whether the driver is looking up or down, the state of the eyelid is determined using a skin color model segmentation of eye area. Based on the knowledge of the interior of the car, they can say approximately where the driver is looking (e.g., rear mirror or dashboard). Bradski [13] one used head motion as an input for applications like computer games and graphics. The system is based on the mean shift algorithm, and they proposed an extension of the algorithm called CAMSHIFT (Continuously Adaptive Mean Shift). Based on an HSV color histogram, the probability distribution of the head location is determined. Based on zeroth, first and second moments they determine head orientation. The controls for the computer game are based on left/ right, back/ forth and up / down motion. 2.7 Visual Perception Studies In studies on human visual perception, eye movement during scene exploration is very important. Cognitive scientists present various scenes to subjects, and observe the pattern of exploration. The subject might be asked to explore the scene for some clues 47 ' ,. iv 7 A H‘ {‘klbl'Ittti' [II In,“ 1 V l ' q ., ”'l’l'lll‘j " '.'I1 I . ‘ .( [14:64; UR ”kit as Initial. Ila ‘1'.“ Nil. 2.7.1 Types . W" r l.‘ l,. ‘ "‘35" 1.05?le MAI 3705‘ if em IL: .. " J“ [ii [if WI 27‘s I Why“ ‘ '- T‘Ifl" II [Ul‘PJXrT all”. . r v: .,‘Iyz _ ,‘, .AaLL‘ “[11; .‘ ‘ or to just examine it freely. Yarbus [97] illustrates seven records of eye movements during the exploration of one picture. In this section, first, the types of eye movement are explained, then, the equipment used in cognitive science studies is described and discussed. 2.7.1 Types of Eye Movement When closely examined, the patterns of eye movement show that there are several types of eye movements. The structure of a human eye is such that only one small portion of the eye has densely packed receptive cells. That area is called the fovea, and it covers about 2 degrees of viewing angle. The area next to the fovea is called the parafovea and covers about 5 degrees, and everything else falls into the extrafovea, that covers about 60 degrees and is perceived blurry by a human. The area that we can perceive in the fovea is equivalent to a word in a page at normal reading distance. Saccades are the principal method of moving the fovea to focus on different portions of a visual scene. There can be 250 saccades per minute, error, since 250 per min makes sense in some cases even 900! When the visual stimuli is presented, it takes about 100—300 msec to initiate a saccade, and 30—120 msec to complete it (depending on the angle traversed, ranging from minimum of 1 to maximum of 40 degrees, most typically 15—20 degrees). A saccade could be voluntary, but once it starts, it cannot be suppressed, nor can its path be changed. During the movement, vision is suppressed. Some studies show that the suppression is not 48 ?ign‘ 2.4: EXEflUplI a} of John .\l. H“ '11:}: l/eyelab . msu. implore. One 3'.» «r. as but) me. a: L1— [Image is presented in colon] Figure 2.4: Example of eye movement during face learning and recognition. (Cour- tesy of John M. Henderson, Michigan State University Eyetracking Laboratory, http : //eyelab . msu . edu/) complete. Once the fovea focuses on the object, the examination lasts about 200—600 msec, and this period is called fixation. Pursuit motion is a smoother, slower movement of the fovea. It follows a moving object, so that it is foveated. This kind of moveirient is not voluntary, but can be induced by introducing a moving object in the visual field. Nystagmus occurs as a response to moving one’s head. The fovea is slowly moved so that the pursued object’s image is followed. If the followed object disappears from the field of view and appears on the other side, the fovea rapidly moves in the opposite direction. In this way we can follow repetitive patterns. 49 2.7.2 5! 1 WT 7,. i‘ n am... 3.} .., l, ”l ‘l[” \ ‘- tn... ‘ . , ‘ .V , Jump It ”a H ”u - .l, , d , “. I; 7‘ W wt .1 1.. _ Alb;'-I.1 v Au. “.4. .‘ ‘. O. \ I“ Q .. V in: IL Ig-l‘f‘l.‘ IA ‘ V I" .,. . r ha 'rl-b‘ If. a, 1“ h, . ‘4';. fl..-) ‘ a“ 1.4 A; ., \I'I ‘I . F ,. A, A . .' -. IA‘. P‘.‘ \ . o L. r ‘44 l‘ ‘ r. V V ‘ ‘ lug-i i \i» 2.7.2 State-of-the-art Eye Movement Tracking Technology Figure 2.4 illustrates the exploration of a sample face image. The lines indicate where the subject was looking, and we can see clearly that most of the attention was devoted to eyes, nose and mouth regions, while background remained unexplored. This kind of image is typical for the visual perception and eye movement studies [38, 37]. How is this data obtained? The state-of—the-art equipment used is called the “Purkinje Image Eyetracker” [22, 23], and is illustrated in Figure 1.1. The subject’s head is placed within a frame, the forehead rests on the forehead rest, the bite-bar is in subject’s mouth. All this ensures that the head stays perfectly still during an experiment. Before a session begins, a calibration procedure is done: the subject looks at several points on the display, so that the vectors defined by the Purkinje images (see Section 2.4.2) can be properly calibrated. It is common that the system gets uncalibrated during the session, so that the calibration procedure must be repeated. Numerous studies have been done using this procedure and subjects can get used to it. However, the equipment is highly intrusive and the user has no freedom of movement as he or she would have in a natural setting. A system that would provide similar eye-tracking data, but in a more natural setting, would be of great benefit. The accuracy of the above eye-tracking equipment is high, about 0.75 degrees or 1 are, and the sampling rate is about 1000 Hz. The challenge of a computer vision based system would be to achieve similar accuracy and performance rates. 50 Chapter Face aflC Th5 Cl; " ~' ’ "‘ ’ “61)ltr (1.151: t. ' .‘J; [9763?] 7 i F , _,.,n of gm Q, , _ 1:. . Hammered \" "‘ (14' saturation a‘ m: C -.... will" " ‘ i he experx F..- 2""? +-JH K ._ S. Eilifl paw Pew ‘ all; h E’s like the r r’~”->*3fll‘fd. The t V m ‘ as. "I ‘ , It“ '- *< 1 [\ Eb‘d‘l' “it", ., If“ a I. ,- “M d:,l[ Qx F, l‘”‘ A¢< Chapter 3 Face and Facial Feature Detection This Chapter describes a method for face localization in a color image and a method for location of facial features. For the face localization, a skin color model based on normalized values of red and green components is used. A classification using the hue, saturation and intensity color space is also presented, and compared with a RGB one. Some experimental results are presented, on both light-colored and dark-colored subjects, and possible improvements are discussed. A novel method for locating facial features like the eyes, eyebrows and nose based on the knowledge of face geometry is presented. The parameters for evaluation of eyes-nose matches are introduced. The method is evaluated on a number of test images of various subjects, both light and dark, and results are presented. The method works under an assumption that there is one face in the image that does not overlap with other skin-color-like objects. If the person wears glasses or has a beard or mustache, the method sometimes does not work well. 51 1: Y 3r. .' ;" ’ ‘¢r;'lll‘l \(.| I 3.1.1 5 1' I . L “H NIH-”CA. “ ' 1 I \ Y .'I ~Lu It”): fv a, \ "I“ “I . “fin‘I.-\ \f‘v. T“‘ol I F“. -~ r‘dLl‘ fr 4 5‘. , ‘ {Cl-f! - " 'L_ . ~‘- Fr. ' ‘ lj7: ‘-. A l:.’_[: rt“ a ‘e. \i. ~I C 4 .~._ . ’ 'iY‘y v.‘A“;l-" 3. 1 Face Detection This Section describes the development of the skin color model and detection of a face using a connected components algorithm. Problems with the skin color model applied to dark faces are addressed. 3.1.1 Skin Color Model in RGB Color Space and Face Lo- calization The location of the face in an image is based on the skin color model. Human skin color can be represented as a cluster in either the RGB (Red, Green, Blue) or HSI (Hue, Saturation, Intensity) space. In this work, the RGB space is used, and a discussion on the use of the HSI space is presented in Section 3.1.2. Figure 3.1 shows skin color clusters obtained from the images of 4 subjects of different skin color (Caucasian, Asian, and Indian). The clusters are based on the 2D plot of normalized red and green components values: _ R _ G Rnorm — R+G+B’ and Gnorm _ R+G+B Figure 3.2 depicts the classification results for the sample images in the middle column. Yellow color represents the primary skin color, and blue and pink colors represent the shadow areas. The detection of face region boundaries in an input image (Figure 3.2 (a)) can be broken into three steps. The first step labels all the pixels in the input image as Face, Shadow_1, Shadow_2 or Other (Figure 3.2 (b)). The second step applies a connected components algorithm to find the components of the first three labeled 52 First 3.1: Skin m 6..., Out of six man: skin color. leg"; and shadmfi :lassits. A row-bye :‘hn small or too bi; e. ,l .. aim 100. the n: fig-xi: object of Ii. 21. .Ll ‘ a C(Jmpar.t l” The 0350 WI; Skin Color Clusters . . . 41 I 0.5 0.8 0.7 0.8 0.4 FLnorm [Image is presented in colon] Figure 3.1: Skin color clusters: the horizontal axis is Rm”, and the vertical axis is Guam. Out of six clusters, three represent skin color. Cluster labeled Face is the primary skin color, and clusters labeled Shadow_1 and Shadow_2 are areas of shaved beard and shadows around eyes. classes. A row-by-row algorithm is used for computational efficiency. Objects that are too small or too big are discarded, thus reducing the number of objects from 5007700 to about 100, the majority of which are in the shadow classes. The third step finds the biggest object of the class Face, and merges it with objects that border it. This step obtains a compact face region even in the case when the subject’s face is in shadow or in the case when the subject has a beard. In some cases, the subject’s forehead, cheeks and beard will be separate components. Thus, merging of these components is necessary in order to obtain the correct face region. The resulting bounding box is shrunk around the first moment lines using heuristics learned from processing many examples to obtain the final face region bounding box (Figure 3.2 (c)). The outer 53 (a) Original Image (b) Skin Color Classification(c) Face Region Boundaries [Image is presented in color] . (Pictures are part of the author’s private database.) Figure 3.2: Isolated face regions on sample images 54 box is the? but ' ' “' r :1 m tetra... Itis iliTW l I ”WWW ‘1: , Hum; . Ml « l, . Mr d ‘1'“ 5:1; '1'} , l Pthl U11 Ill“ ‘ w, . , . I,‘,I?". ,1 .,~ .- ll;.ll...: vld.‘ .‘i 3.1.2 Ski lflé (lam (Tidal: ~ ‘5. “£11.:LOLi5: J H r-A box is the bounding box for the original face object, and the inner box is the target area determined from the first moments lines. This three-step approach has been tested on many images of many persons, in- cluding two open-house sessions when dozens of unfamiliar people tested the program. For a very small number of subjects, the skin color model did not work well. The skin model worked well even for subjects with very dark skin, provided that ambient lighting was strong. 3.1.2 Skin Color Model in HSI Color Space The skin color model can be developed using the Hue, Saturation and Intensity (HSI) color space. Translation from the RGB to HSI space is done using the following equations: 12 R+G+B S: 10_ 3min]R,G,B[ I arccosgm—G—Bj 2\/(R—G)2+(R—B)(G—B)) Figure 3.3 shows the plot of hue vs. saturation. Three skin color clusters are identified, similar to the clusters used in the RGB space: the primary face color (labeled as Face) and areas of beard and shadows around eyes (labeled as Shadow-1 and Shadow-2). Figure 3.4 compares the classification results for the sample images using the RGB and HSI models. Yellow color represents the primary skin color, and blue and pink colors represent the shadow areas. As can be seen, both models are classifying skin color with similar success. The RGB model is classifying darker skin color a bit better 55 Skin Color Model in HSV color space Saturation “XXX XXX X X X [Image is presented in color] Figure 3.3: Skin color clusters in HSI color space: the horizontal axis is hue and the vertical axis is saturation. Skin color is represented with three clusters. Cluster labeled Face is the primary skin color, and clusters labeled Shadow-1 and Shadow-2 are the areas of shaved beard and shadows around eyes. than HSI model. Another advantage of the RGB model is that the values are available directly from the camera interface, while for the HSI model, the values need to be calculated for each pixel, and the computation could be time consuming. 3.1.3 Problems with Skin Color Model The processing using the skin color model gives erroneous results in some cases. The most common errors occur when the face object is close to some object of similar color, especially the subject’s clothes. Another object can be recognized as a face, or will create one component with the face. Figure 3.5 illustrates this problem. If another object and the face are recognized as one, the algorithm has some chance of locating the eyes and nose. However, if the unwanted object is recognized as the face 56 s.o.Lmrrzsar-llllll—- TI? .dL Ho "£3172 ; i5 r Q: \" ,i'il"'.u, (a) Original Image (b) RGB Classification(c) HSlClassification [Image is presented in colon] , (Pictures are part of the author’s private database.) Figure 3.4: Comparison of RGB and HSI models for skin color classification 57 ~1- WHFf‘ uh. 'r. ‘ s .M [r 4 ’ 1, . r . : ,t.[“ 11., ‘n‘ -.' l' l. N. l‘. - ‘t '(1 n' I." ~. t-i rt 4“ ' 1, ZIP; . (a) Original Image (b) Skin Color Classification(c) Face Region Boundaries [Image is presented in color] , (Pictures are part of the author’s private database.) Figure 3.5: Erroneous face regions due to similar background and face colors. object, the real face is lost. Some control of the environment greatly reduces these problems and maintains real-time speed. 3.1.4 Problems with Dark Skin Color The skin color model described above works well for light colored faces. In the case of dark skin color, if the room is well lit, the model will be able to segment the skin. However, if the lighting is not strong enough, or if the subject’s skin is very dark, the model will not cover that skin type. As an illustration, Figure 3.6 shows images taken under the same settings as the other test images, where the skin color model Classification fails in two cases (first two rows) and succeeds in one case (third row). As can be seen, the first two subjects’ skin color was classified as mainly shadow area, thus the connected components algorithm could not find a big enough component 58 1" v. V‘s IJ \ *. .Ll 'r,- wm‘r _ h ‘ '~ 1. .‘r .nt'mfit v , (a) Original Image (b) Skin Color Classificationtc) Face Region Boundaries [Image is presented in color] , (Pictures are part of the author’s private database.) Figure 3.6: Localization of dark skin color subjects: two failed and one successful] classification of the face color. Big connected components of shadow class are discarded by the program as possible background. In the third, successful case, the combination of the primary skin color and one shadow class enabled successful face localization. 3.2 Improvement of Dark Skin Color Detection The main problem in detecting dark skin color is that the skin color is not well clustered in Rm”, x Guam space. The skin points are mainly spread across the area 59 Ill tlusre lam-.1 [at :uu‘..~zue;~ l1 bl ,. .‘ ig‘irilalc llli' ll.‘ Is fall it m It]. [ill OI alum- I>a ‘p.$0 \. . .....ei In law I ”77‘ wig" lll Sllt‘ll a -v-. lie Ina“ I Y” I . ' .qqu‘I‘fi‘ . ‘ \ f‘ I.‘ 5‘” ... é‘ai ‘ x: .J- ‘l 0., ‘ I). . ~‘_,l .1 «1:; "Tim" ‘-.fi.'i ., ‘ r1 ‘ TE I f»- if '1 ‘ Ne 24(1‘:;: L‘ .l ‘ I it- in “ii"" .. I, g“ :r.‘ ,gqr- ‘ 1i (ill, ’1'. of cluster labeled “Other_2” in Figure 3.1. However, histograms of red, green and blue intensities have the same shape for both dark and light skin tones. Figure 3.7 illustrates the histograms for five windows. As can be seen, the peaks of the histograms for the light skin tone are around 80, 70, 60 (or above) for red, green and blue, respectively. If the dark skin histogram is shifted to have the peak at the same value as the light skin one, we can transform the image in such a way that the skin color then fits into the proposed skin color model. The transformation function used is the following: alc * Im,g(i,j, c), if Ion-g(i,j, c) < histpeakc; [alteredayja C) : a2C * Ion-9(i, j, c) + b2c, elsewhere, where, I (i, j, c) is image intensity at row i, column j, and of color c E {R,G,B}, hist_peakC is determined from the corresponding color histogram, and a1 = constc c histmeakc a2 _ 255—const c _ —————£— 255—hist.pealcC __ constC—histeakc ””26 — 255 255-hist_peakc ’ and constR : 80, constG = 70 and constB = 60. The darkness of each window can be checked automatically, by comparing the hist_peakc value with boundary values for R, G, B, that are empirically found to be 1 3C0nStR, gconstg and constB, respectively. Then, the window can be considered to be a dark skin area if at least the three leftmost or rightmost windows are dark. If the above formulae are applied to the first two images from Figure 3.6 (a), we get the images where the skin is easily classified. Resulting altered input images and the skin color classification results are shown in Figure 3.8. 60 W1 W3 W4 W0 W2 (a) Windows labeling: :fisifia 911.11- $.11] hr; .5: .... = .... i. ._ (c) Red, green and blue intensities histograms: Black line is dark-light skin boundary, dark red, green. blue bars are peak values [Image is presented in color] , (Pictures are part of the author’s private database.) Figure 3.7: Red, green and blue intensity histograms for dark and light skin tones 61 Skin color classification of altered images [Image is presented in color] , (Pictures are part of the author’s private database.) Figure 3.8: Color-altered dark images, and resulting skin-color classification 3.3 Eyes, Eyebrows and Nose Location Once the face target area is found, the search for the eyes, eyebrows and nose is limited to its bounding box. A greyscale image is used for this, specifically, the smoothed red component of the input color image. Finding the eyes, eyebrows and nose is based on the knowledge of the face geometry. In this Section, the algorithm and heuristics used are described in detail. 3.3.1 Detecting Eyes The eye pupils are the darkest objects on a typical human face, regardless of the eye Color. If the face and eyes are brighter, the pupils will be brighter, but still the 62 darts! Ul’t‘liill. we pupils will l1: figure 3.9. hrs: tzifllpbllt‘lll.‘ of a hlht‘ ml l Ilii‘lt' - finding the t'. p‘ . k , } [I i. to min-1r. lure ._ pruhle—m is that :h' implied until Iv. tithe red. grail : an. in the red in [inherent they i. I‘. - .~ .‘ n.” air at nest: l.»- 7”,”, Li 3‘ . . l _, ,. area. and v. 5h: 5 in! ii'iil'iplii t r ‘ F J! "-'T I ',l; [E darkest overall. The red component of a color image absorbs the most light, thus the eye pupils will be darkest, compared to the values of the green or blue component. Figure 3.9, first column, illustrates the difference between the red, green and blue components of a color image. Since our skin is of reddish color, skin is rather bright in the red image compared to the green one. Finding the eye blobs can be reduced to finding the two darkest objects in the face image. Thresholding at an appropriate value would produce two eye blobs. The problem is that the threshold is not the same for all eyes. Thus, gradual thresholding is applied until two dark blobs are found. Figure 3.9 shows results of thresholding of the red, green and blue component at gray levels 1, 5, 10, 15, ..., 35, 40. As can be seen, in the red component image the pupils are easily identified, while in the green component they become visible at a higher threshol, and in the blue component they are not visible below the threshold level of 40. Additionally, since human eyes are often of bluish or greenish color, these components would have high intensity in the pupil area, and would not be the darkest area in a face image. For each thresholded image, a connected components algorithm is run to find dark blobs. Blobs that are at the edges of the target area, too small, too big, or sparsely filled are rejected. Figure 3.10 depicts the results of gradual thresholding for a sample image. The threshold is increased from 0 until a successful detection of eyes and nose is found. For faces with dark eyes, smaller thresholds result. For faces with bright eye pupils or for very bright images, higher thresholds are needed to get the eye blobs to be below threshold. 63 l lilti" II‘ ”it". an .3 -3 30 f' P]: .F‘ . 3 ’ .1); I ‘2 I t y c 4 t .Qr {at ff. Grayscale and thresholded images Threshold Red Green Blue Red Green Blue 1 . 5 . 10 5 ~ l l 15 K A a i: 20 i l 3. I' 25 l d . .. is I 30 l d I l n J L I ‘ 3. F ' ' ' ' ” ‘ t 35 n ._ I k - I n A L ‘ I ' i- I I y ‘ . ’— :-- .. -v. 3 .. I - -. .. u‘ .. a” 40 34.1 |.;.‘ a I 5 ul ' 5t I (Pictures are part of the author’s private database.) Figure 3.9: Red, green and blue components of two images: gradual thresholding in search for eye blobs 64 W1: F " p‘o : L... F" '70. . ...i‘,r_ I \ '1‘ Cl; '7 J Y n [$6. in 1, " A- ._ ‘ ‘1'. A: 't: ’~.‘: ”‘1’, .. .1. ' 3- ’: "l ‘7‘]. i )n 1." at“; “U [‘r . r. ‘1 ,l ‘9 use] 4'44“ ,4 t p... "v A. -«.“[ -.v I 4 Low threshold High threshold (Pictures are part of the author’s private database.) Figure 3.10: Gradual thresholding in search for eye pupils When only one image is processed and there is no prior knowledge of the suitable threshold, it is necessary to try thresholds starting from the lowest value up to the value that yields a satisfactory match. However, in the case of continuous processing, prior knowledge of the threshold for the previous frame can be used. Thus, the algorithm can adapt itself to the current subject’s face and to the current lighting conditions. 3.3.2 Matching Eyes with Eyebrows and Nose As can be seen in Figure 3.11, at the best threshold we typically do not get only two dark blobs, but often more than two. For each blob that is a candidate for an eye, it is first checked whether an eyebrow is present above the eye. The existence of the eyebrow is verified by looking for an edge in the average threshold image that is 10—30 pixels away from the eye. The blobs that do not have the edge representing an eyebrow are discarded and not considered candidates for the eyes. All the remaining blobs are assumed to be candidates for the eyes, and are matched to find the nose. The image that is used to find the nose is the red component, intensity-thresholded at the average intensity. When finding the nose, it is assumed 65 (a) Original image (1)) Best thresholding (c) Nose line ((1) Feature points (Pictures are part of the author’s private database.) Figure 3.11: Matching eye blobs with nose 66 that most of the lighting on the face is from above. It is also assumed that the nose line is perpendicular to the line between the eye pupils, and that it intersects the line between the eye pupils roughly at the midpoint of that line. The precise location of the beginning of the nose line is the mid-point of the widest above-threshold region on the eye line. The estimated nose width is equal to the width of that above-threshold region. The beginning point. of the nose line is on the eye pupils line and the nose line ends at the tip of the nose. Then, the nose line is pursued downwards, using the estimated width, until a significant below-threshold area is found. That point is considered the tip of the nose. The shaded areas in the Figure 3.11(c) represent the traced nose line. Each eyes-nose match is evaluated based on several heuristics. The match with the highest score is selected as the eyes-nose match. Scoring rules are based on the comparison with the current face region dimensions. Thus, the user’s face does not have to be at a fixed distance from the camera, since the parameters that are used for match scoring will automatically get re-adjusted when the user’s face moves. The only constants used are the approximate relationships for eyes and nose distances, and they were determined empirically. Each match is evaluated using the following rules: 1. If the Euclidean distance between the eyes is smaller than 25% of the target area width, or greater than 70% of the target area width, the match is rejected. 2. If the Euclidean distance between the eyes differs more and 9% of the expected eye difference, the match is rejected. 67 3. Add 100 - (1 — abs(§§ftXV:—I——3§§:—:§)) to punish if the slope is too big. It is assumed that the subject will not turn the head for more than 45 degrees. If the slope is more than 45 degrees, the match is automatically rejected. 4. If one of the eyebrows has not been found, but estimated, the whole match value is multiplied by 0.9. Add percentage of fullness of the forehead line with respect to the skin clustered points. The forehead line is the line between two eyebrows. . _ abs(eyebrow1dm—eyebrow2d,5t) - , , , Add 100 (1 max(eyebmwldm,eyebmwz‘w)) to reward symetric eyebrows. eyebrow d,“ is defined as the shortest distance between eyebrow and eyeline, and is calculated as the number of points from eyebrow to eyeline along the eyeline normal. 5. If calculatedmoseridth < min_nose_width, subtract 10 to punish the absence of above-threshold area on eyeline, and min_nose_width : 0.1eye_distance. calculated_nose_width ) perfect _nose_width 6. If calculated_nose_width > max_nose_width, add 30- (3 — to punish wide above-threshold area on eyeline, and perfect_nose_width = 0.3eye_distance, and max.nose_width = 0.5eye_distance. _ num..black_points 7. Add 100 (1 totaLl_p0ims_0nJosefina) to reward above-threshold areas on the nose— line. Add one half of the average gray value on the nose line, to reward bright areas, and punish dark areas. The problems with the above algorithm occurred when the lighting conditions were such that one side of the face was in shadow and the other side was directly 68 l' lu A... L:! m"; y. unu- 0,7 ‘19 A L, 1‘7‘ ‘5' .1“ . 'r . illuminated. In such cases, both eyes could not be found using the same threshold. The solution for this problem would be that the subject does not work under such extreme conditions. It is likely that any subject who is working in front of a worksta- tion would not like to be directly illuminated, since a lot of reflection is present and the screen cannot be seen clearly. Another solution to this problem is to remember which blobs are growing in size, as the threshold increases, and to use their locations in a higher threshold. In that way, we can compensate for the disproportional lighting on the faces sides. Figure 3.12 shows examples of the feature detection for various pan, tilt and roll angle rotations, some of which are extreme. The program imposes a limitation of up to 45 degrees of roll angle. The eye features are lost if the head is panned completely to the left or right. As roll and pan angles increase, so does the error in detecting the tip of the nose. 3.4 Accuracy of Feature Detection The accuracy of the feature point detection was tested on four subjects (two with light skin color and two with dark skin color). The continuous stream of images was recorded while the subject was moving, e.g., rotating the head or moving toward or away from the camera. Two sets of images were recorded: one with a normal background (e.g., white walls, workstations in the background, etc), and another one with a plain green background. Figure 3.13 illustrates the input images used. 69 (Pictures are part of the author’s private database.) Figure 3.12: Results of eyes-nose detection for various roll, pan and tilt angles. The white rectangle shows face boundaries, and the white dots show the locations of the eyes, eyebrows and tip of the nose. [Image is presented in color.) , (Pictures are part of the author’s private database.) Figure 3.13: Sample images used for accuracy of feature detection calculation 7O Light subjects Dark subjects Background Background Norm. ] Plain Norm. [ Plain Matches(%) 93.00 99.00 58.33 80.00 Outliers(%) 5.00 0.00 5.00 1.67 Not found(%) 2.00 1.00 36.67 18.33 (a) Percentage of matches found, matches not found, and outliers. Light subjects Dark subjects Background Background Norm. | Plain Norm. 1 Plain Left Eye 1.04 1.09 1.59 1.59 Right Eye 1.52 1.52 2.78 1.58 Nose 4.05 3.77 3.53 2.63 (b) Average Euclidean distance for matches in pixels. Table 3.1: Accuracy of the feature location. Percentage of found matches and average Euclidean distance of program-found features from hand-labeled location. Compara- tive results for normal environment vs. controlled (plain green) background are given. Then, a random sample of (50 or 30) video frames was selected (from 300 or 150 frames). The eyes and nose locations were hand-labeled and the Euclidean dis- tance between the labeled features and the program-detected features was calculated. Points that had Euclidean distance of more than 10 were considered outliers. Table 3.1 shows the average Euclidean distance in the set of images when the correct match was found. Also shown: the percentage of correct matches, percentage of outliers and percentage of times when the match was not found. For the normal background, matches were found for 58—93% of the frames; for the plain green background the success was 80—99%. The percentage of outliers for normal background was 5%, while the percentage of outliers for plain background was 0—1.67%. For the dark-skin subjects, the percentage of missed matches was 36% with the normal background, and 20% with the plain green background. Clearly, 71 the plain green background enhanced the performance. The error measured as the average Euclidean distance between the hand-labeled and the program generated feature points was 1—4 pixels. The error for the eyes was very small (1—2 pixel off), while the error for the tip of the nose was larger (2—4 pixels). This discrepancy is due to the lighting conditions and our assumption that the line between the eyes and down along the nose make a 900 angle. When the face is rotated significantly the angle is smaller than 90". However, the error made is still small, and the amount of computation is reduced since only one angle is considered. The eye blobs of the dark-skin subjects were found at very low thresholds (0 or 1 in most cases), and the threshold increment had to be modified to 1 instead of 5, which was used for light-skin subjects. This problem can be overcome with increased ambient lighting or more gradual increments in the thresholding based on the knowledge that a dark-skin subject is using the system. 3.5 Summary In this Chapter, a method for face localization in a color image was presented. The method is based on a skin color model that covers a wide range of faces, from light to dark ones. Both models using RGB and HSI color space are presented and compared. Some problems with the skin color model due to background similar to the face color were presented, as well as problems in the detection of dark skin. A method of color alteration for improving dark skin detection was presented. A novel method for locating facial features based on the knowledge of the face geometry was presented. 72 The method finds the eyes, eyebrows and nose of a subject. The evaluation of the accuracy of the feature detection showed 1—2 pixel accuracy for detecing the eyes and 2—4 pixel accuracy for detecting the tip of the nose. 73 J ,_. {7‘ . T _ .‘v... Chapter 4 Tracking the Features In Chapter 3, the algorithm for face location and facial feature location was described. The algorithm is designed to work on individual images. In this Chapter, it is de- scribed how the algorithm from Chapter 3 has been integrated in the processing of video stream data. The input to the base program is a live video stream from the camera attached at the t0p of the workstation monitor, and the output is the list of eyes—eyebrows-nose coordinates. The frames are processed in sequence. To take ad- vantage of movement history, a Kalman filter is used to estimate the motion vectors of the tracked feature points and future position of tracked feature points [43]. The state diagram of the system is presented that defines how the individual steps of the algorithm are combined to achieve real-time processing rates. Time-space diagrams representing the feature coordinates in time are analyzed. Finally, numerical data are presented that show that the system can run at real-time rates with a good accuracy. 74 l'" .l..- 17 A“ a,” 3.37: ‘V‘ t ‘1 1.37 V ‘ ‘7 “ will: ‘I 1 r 6 I f‘. l w k, ”1» ,v 7‘ '«u g ‘r ~ 4.1 Tracking System State Diagram The system state diagram is depicted in Figure 4.1 (a). The tracking starts in state NO_FD (no face detected). Once the face object has been located (state FD_FIRST) in the input video stream, feature points are tracked using the approach described in Chapter 3 (state FD_TRACK). First, an attempt is made to find tracked features in the location of the previous frame. If that does not succeed, a new face position is determined. Thus, time is saved by not unnecessarily running the face detection algorithm, which is the most time-consuming segment. Since the subject’s motion is typically smooth, once the face region has been locked, the tracked features can be located in the same face bounding box for a number of video frames. If tracked features cannot be found, a prediction based on the Kalman filter is used to fill in the gap (state FD_PREDICT). The prediction is done for a limited number of frames, namely at most three. If the tracked features are found while in the prediction state, a switch is made to the recovering state FD_RECOVER in which only measurements are taken from the environment and no prediction is made. The system stays in the recovering state as long as it was in the prediction state. After the recovery period, if tracked features have been found, a switch is made to the tracking state. If the tracked features have not been found while in the prediction or recovering state, the face finding algorithm is started again to find a new face location (state FDJEWPOS), since the assumption is that the state of the environment has changed too much during the prediction period. If the face has not been found, a switch is made to the initial state, whereas if the face has been found, a switch is made to the intermediate 75 match A -m FD grab.match no match match no match match (predict) (recovered) - match FDJREDICT (not grab.match face recovered) match A. FDJIECOVER grab.match no match (do not predict) no face (a) Original State Diagram match PDJ'IUCK grab,match no match face. no _match match (predict) (recovered) ‘ FDJ’REDICT grab,face,rnatch face, match match (not recovered) face 1‘ rnncom grab,match no face, or face, no match (do not predict) (b) Improved State Diagram Figure 4.1: Tracking System State Diagram 76 continuation state FD_CONT. In this state, an attempt is made to find the tracked features in the new location. If the features have been found, a switch is made to the recover state. If the features have not been found, a switch is made to the no-match state FD_NOMATCH. In this state, the face object is located in the image, however, no face features are found. If the face starts to move, the diagram in Figure 4.1 (a) will not respond promptly to the movement, and the tracking parameters will not be adjusted well. Thus, an improved scheme, depicted in Figure 4.1 (b) is introduced. The only difference is that in the state FD_PREDICT, an attempt is made to find the face first, then to try to find the features in the new locations. In this way, if the face moves a bit, the new face boundaries will be found that will fit the data better, and the tracking parameters will be adjusted faster. The motion of six variables is estimated: the X and Y coordinates of both eyes and nose. In the Kalman filter equations, using previous and current values of the variables, we manipulate 2 x 2 matrices, and matrix operations such as the inversion are not time consuming. The time update (“predict”) equations for the projected state of tracked point 2?; +1 and the error covariance matrix P;+1 are: 2 —1 f if“: Akik= k1 1 0 f‘ P;+1 = AkpkAf + Q, where fk is the estimate at time k, and f k“ is the estimat at time k — 1. In our case, 53k is the X or Y coordinate of the tracked point. 77 Tilt ~~p1”‘, 1 fillia:fi\ O “‘K' K. 37,7, }. \(‘vr “dd .’ .C The measurement update equations for the Kalman gain Kk, the update of the estimate with the measurement 2),, and the update of error covariance matrix are [43]: K), = PUP; + R,,)'1 X), = x; + KHZ)c — Hkxg) Pk :— (I —— Kk)P; where we set Hk 2: I, and empirically determine the measurement error covariance matrix R), and the process noise Qk. The predicted coordinates are used in two ways: (i) to verify the position of newly found features and (ii) if the tracked features are lost to predict their location and thus smooth out the tracking and avoid losing the tracked features. In the former case, each match is checked against the predicted location. The simple Euclidean distance is calculated for each of the coordinates, and if the overall distance is too large, that match is discarded. It is assumed that such a match is rather far away from the real match, e.g., the eyes and ears might be matched. In the later case, the predicted coordinates are used as the output of the program, instead of the program-calculated ones, as described earlier. 4.2 Using Motion Detection to Improve Tracking Results In some cases when the user’s clothes or background are of color similar to the user’s skin color (e.g., reddish, pinkish, orange colors), the segmentation of the face vs. background is not very good. To overcome that problem, we use motion detection 78 |\ F). \i..~4 r o . CF’ to facilitate face-background segmentation. The area where there is some motion is likely to contain the user’s face only, and the areas where there is no motion is assumed to be the background. We need to focus the search for the face only in the area where there is motion, and the rest of the viewing field is ignored. Thus, if the background includes the colors that are similar to the skin color, they would not affect the search for the face. The motion detection algorithm used is rather simple: for each frame at time t, the difference between the frames 75 and t — 1 is calculated. All pixels are flagged as changed, if the pixel value difference is larger than 10, or same otherwise. Every 30 frames, a check is made to determine which pixels have changed in the previous interval, and the motion area is calculated, by finding connected components out of the moving pixels. Components that are out of range of the previously detected motion object are discarded, to reduce the noise. At times of significant motion, the motion area is updated at the global level, and when there is just slight or no motion, the previously found global level motion is used. Once we have determined the area that is moving, we restrict search for the face to that area. In this way, we can speed up the face detection algorithm, and compensate for the time spent in calculating the motion. Real-time processing rates are still achieved. By detecting the motion area, we can also easily check whether a dark face is present using the algorithm described in Section 3.2. We need only check the windows that are centered around the motion area center. If we didn’t have the information about the possible face area, we would need to check all possible sub-windows for 79 Euclidean distance No motion detection Motion & dark face from hand-labeled no KF ] use KF no KF L use KF Eye 1 2.84 3.31 1.75 2.29 Eye 2 8.77 8.83 4.46 4.25 Nose 5.11 5.47 8.17 8.88 # Matches 102 117 130 116 # Outliers 6 27 11 50 # Misses 227 191 194 160 # No Matches 25 25 25 25 # Detected No Matches 24 25 25 19 Table 4.1: Accuracy of the motion detection and dark face detection given as the Euclidean distance between feature points and hand-labeled locations. Euclidean distance is given in pixel units, and the number of matches is in frame units. dark areas, or to restrict the face location to some specific image area (e.g., center of the image). The latter approach would significantly constrain the users. The above approach was tested on a 359-frame movie of a subject with dark-skin. Table 4.1 compares the accuracy of the detection when motion detection is and is not used, and when the Kalman filter is and is not used. The Euclidean distance between program-detected and hand-labeled feature points in each image frame is calculated in pixel units. The table also includes the number of video frames processed: the number of correct eyes-nose matches, number of outliers, number of frames where no eyes-nose match was found, number of no-matches in the video stream, and number of detected no-matches in the video stream. As can be seen in the table, the number of outliers increased, while the number of missed matches decreased when the motion and dark-face detection were turned on. Also, the accuracy of finding both eyes increased. However, the nose location detection error increased. This is due to the altering of the input image, and in that process some information about the precise 80 uA‘ :‘i'lfl 5.1 ‘ul ..1 nose location was lost. The use‘of the Kalman filter resulted in fewer no—matches and more outlier matches, in both the case of motion detection and no motion detection. The increased number of outlier matches is due to the fact that the use of the Kalman filter prediction might reject a correct match if it is locked onto an incorrect match. The most important issue is that the face area was detected more accurately in the case when motion detection and image alteration was used, which leaves room for improvement of the feature finding algorithm. 4.3 Results Using Kalman Filter Figure 4.2 shows results of the feature tracking on a sample movie file: the X and Y coordinates of eyes and nose points are plotted over time. We compare the results of the tracking without smoothing and with smoothing and prediction based on the Kalman filter. As can be seen, the number of peaks in the case when the smoothing is applied is significantly reduced compared to the case when the smoothing is not used. The accuracy of the feature point detection is measured as the Euclidean distance in pixels between program-detected and hand-labeled coordinates. In our experiments, the camera had square pixels so the X and Y scales are the same. Table 4.2 (a) shows the accuracy measured on three movies (two light-skin and one dark-skin subjects, with the total length of 835 frames). The program detection was typically 1—3 pixels away from the real locations. The location of eyes was 1—3 pixels away, while the location of nose was 2—3 pixels away from the hand-labeled location. 81 ..huvun, aleulvtil, :47 , . ‘Jlu‘fl, ~. X coord in time - Results on a sample movie file mv1 - No smoothing Y coord. in time - Results on a sample movie file mv1 ~ No smoothing ‘v v 1 E a . éyei X Y -— 3 Y . Eyel Y -—— 3 Eye2 X EyeZ y a... .._ 8 20° Lw..- 4 _— Test-"“3- ”Wily _Nosex ‘ Fl" ‘ g 150 “II" Nose‘Y- V ‘,_ ' I, ,9 ! I . . . , ‘ —.. ’. 31ml 75 ’L' 1 U 110 » . v ,. 7 ' . x 50 ‘ * c 90 * * * . ‘ * ‘ ‘ O 50 100 150 200 250 300 350 400 450 0 50 100 150 200 250 300 350 400 450 Frame Frame X coord. in time - Resmts on a sample movie hie mv1 - Apply smoothing Y coord. in time - Results on a sample movue tile mv1 - Apply smoothing E , F I éyel X I i I I ‘ - Y éyel Y 7 3 Eye?. X EyeZ y -_-__ g 200 ________.__ _____ ’f“":'_"_~.\ Nosex - r” ‘ g 150 L NoseY. g 150 WM 4i:/\ E 130 WA 1.12% [A ”JL 0 ~- . , U 100 ir 5:2,} > 110 L a... . ._ -. . .— x 50 I I 90 . . 1 . A O 50 100 150 200 250 300 350 400 450 O 50 100 150 200 250 300 350 400 450 Frame Frame Figure 4.2: Comparison of the X and Y coordinate tracking without (top) and with (bottom) the use of Kalman filter for smoothing and prediction. These results were obtained after the outlier matches were removed from the statistics. Table 4.2 (b) shows for each subject the number of correct eyes-nose matches, number of outliers, and number of frames where no eyes-nose match was found. The errors are similar in both cases. However, the number of outlier and missed matches are lower if smoothing is applied. This ensures smooth changes of the feature coordinates in time, which is important for gaze direction and menu selection, as will be discussed in Section 5.2 and Chapter 7. The execution time of the program is such that real-time tracking rates are achieved. An image size of 320 x 240 allows the user to move freely in front of the camera and to sit far from the camera (and display). In the current set-up, if the subject sits about 506m away from the camera/display, the subject’s face is about 64 x 64 pixels, which allows reasonable matching results. For the above image size, we can achieve a frame rate of 10—30 Hz on a SGI Indy 2 workstation. 82 No smoothing Apply smoothing Eye 1 Eye 2 Nose Eye 1 Eye 2 Nose SI 1.44 1.20 3.14 1.60 1.38 3.54 S2 1.30 3.34 3.21 1.18 3.11 3.22 S3 1.21 1.23 2.33 1.29 1.34 2.19 (a) Euclidean distance for the eyes and nose in pixels between hand-labeled eyes and nose locations, and computed locations. No smoothing Apply smoothing Matches Outliers Not Found Matches Outliers Not Found S1 240 3 1 242 0 2 S2 332 0 7 334 2 3 S3 252 0 0 251 0 1 (b) Number of video frames processed: number of correct eyes-nose matches, number of outliers, and number of frames where no eyes-nose match was found. Table 4.2: Accuracy of the tracked feature location Table 4.3 shows the execution times for six sample runs of two subjects. Three different motion patterns were tested: no significant motion, smooth motion and sudden movement (subject moving extremely fast). All measurements were done for 1000 frames of video on an SGI Indy 2 workstation with the input image size of 320 x 240 pixels. The first row shows the total number of matches, and the second row shows the number of frames spent in the tracking state. The execution time of the face detection algorithm (row FD_NEWPOS) is 80—90 msec, while the time needed to locate the tracked features in the face (row FD_TRACK or FD_CONT) is 10—-30 msec. The overhead induced by grabbing and displaying the image is not significant, and the overall frame rate that can be achieved for smooth motion is 20—30 Hz. In the Case of sudden movements, the subject’s face position changes in every frame, and the face detection algorithm dominates the execution times, so that the frame rate 83 1&3 y 1.7" “lg! No Smooth Sudden Motion Motion Moves R1 R2 R3 R4 R5 R6 Match 999 1000 998 999 958 799 FD_TRACK 992 998 941 924 623 431 Average Execution Times in milliseconds FD_TRACK 14 12 29 21 54 96 FD-NEWPOS 94 125 89 90 90 86 FD_CONT 24 16 37 15 60 53 Analysis 15 12 37 29 111 175 Grab 12 15 5 8 11 10 Display 6 5 7 5 6 5 Total 34 33 49 42 1'29 191 Frame Rates in Hz Analysis * 69 81 27 35 9 6 Overall ' 30 30 i 20 24 l 8 5 Table 4.3: Execution times for six runs (two subjects). The program was run for 1000 frames of video, on an SCI Indy 2 workstation with the input image size 320 x 240. achieved is low. If we used smaller images, the frame rate would be higher; however, that would significantly constrain the subject’s movements, if we are to track eyes and nose accurately. Similar execution times were achieved on a 200 MHz Pentium PC with a Matrox video board, running Windows NT and using Vision SDK as the interface for the camera. In this case, the major bottleneck was the Vision SDK interface, while the processing itself took the same time as on the SGI workstation. Thus, the maximum frame rate achievable was 10 Hz. The tracking program was tested on numerous occasions, including several open- house sessions. Both light-, brown- and dark-skin subjects were tracked with good results. In the case of dark—skin subjects, in some occasions extra lighting was used to 84 increase the ambient lighting of the room. When measured formally using recorded movies, the subject’s face and face features were successfully tracked in the video stream for 99% of the frames. We did, however, have some difficulty with the system performance in another room in another location during a demo session. 4.3.1 How far from the camera can the user be? In majority of the experiments, the users were sitting at about 2 ft distance from the camera, which was located on top of the workstation monitor. In the early stages of the development, we used a low-quality generic SGI eye-camera for our experiments. In the later stages of the development, we used a high—quality pan-tilt-zoom camera. The tracking program worked well using both cameras. The users were not required to be at a fixed sitting distance from the camera during experiments. They could move closer or further, and the facial features were tracked without problems. Our facial features finding heuristics do not require any user-specific information. As long as the facial features are spaced enough for the described geometry to work, the features could be located. In an experiment using a pan-tilt-zoom camera, we measured that the smallest face area in which facial features could be determined was 50 x 70 pixels (using a 320 x 240 pixel image). In such a case, the distance between the eyes translated into 20 pixels, and eyes-nose distance was 10 pixels. Similar results were achieved using a generic SGI eye-camera. If the zoom feature of the camera is used, the maximum face distance from the camera depends on the zoom factor. In our experiment, at the 85 ;,, ljv“ “l’k‘rw . “A .‘1 F 7‘ maximum zoom factor, the user was 7 yards (6.5 meters) away from the camera and the facial features were tracked by our algorithm. All the above experiments were conducted in a controlled environment: a room with fluorescent ceiling lighting and no windows. No additional light source has been used. 4.4 Analysis of Movement Data What can the time-space statistics tell us about the subject’s movement? Figure 4.3 shows the X and Y coordinates in time for two sample movie files. Figure 4.3(a) shows a sequence when the subject was first still, then moved the head up and down three times, then moved head left-right six times. The last graph shows the X and Y image coordinates, and we can clearly see the motion indicated by the plots of the coordinates in time. In Figure 4.3(b), we see a sequence when the subject was moving the head left and right, and down, which indicates spiral moves. The X vs. Y plot shows the spiral moves for each tracked feature point. The plots from Figure 4.3 resemble the plots from Yarbus [97], and we can analyze them in a similar way. Additionally, the data could be used to monitor the user’s movements, and certain movement could trigger certain actions, as will be discussed in Chapter 7. 86 i l .A: has . rt 5 . l4 X coord. in time - Results on a sample rnovue file mv3 Y coord. in time - Results on a sample movie file mv6 :1 tm/hw:_ 7 - m—fi/‘Lf’p’xrhv'r‘j’i‘ sir-ward" l‘, at“. _ ..-... ”W” k , _. ,. .. :5“ ' ~ , x ... --— u’l‘ ‘ \_ - . W _ WW MAD/x» i 1 0 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250 300 Frame Frame eye1 —— eyez . ...,. X Image coord. (column) 8 Y image coord. (row) Y coord. in time - Results on a sample movie file mv3 X coord. in time - Results on a sample movie tile mv6 240 e 61 — 22 y — 01002003004005006007008009001000 Frame Y Image coord (row) 8 8 8 l l l , <5 '3 h 3 8 O X image coord (column) Eye and nose X and Y image coord. in time. movie file mid Y image coord. v image coord. 8 so 70 7 "‘ . 110 120 130 140 150 180 170 180 190 200 210 120 130 140 150 160 170 180 190 200 210 X image coord. X image coord. (a) Movie file mv3 (b) Movie file mv6 Figure 4.3: Sample time-space statistics for two movie files: X and Y coordinates of feature points in time, and X vs. Y image coordinate 87 4.5 Summary This Chapter presented an integration of the feature finding algorithm into a real- time tracking application. The system state diagram was defined, and the use of Kalman filtering was described. Kalman filter estimate is used in conjunction with the feature finding algorithm to eliminate erroneous matches. In addition, the filter estimate is used as the output of the system when the features could not be found by the algorithm, which resulted in smoother tracking. The use of a motion detection algorithm was presented. Motion detection was used to overcome the problems when the face-background segmentation and to facilitate dark face location. Accuracy results and execution time data were presented. Finally, plots of the tracked feature coordinates in time were presented and analyzed. An analogy to the eye-movement pattern analysis in Cognitive Science was made. 88 Chapter 5 Gaze Direction Detection This Chapter presents a method for detecting the users gaze direction based on the coordinates of the eyes and nose in the image. A neural network used to map the user’s facial feature location to screen coordinates is described, as well as a method used for its training. The mapping of the motion of features in the image to the motion of the screen cursor is described. The joint use of the above two mappings is discussed. Methods to smooth out the gaze direction are presented. Finally, results of the gaze tracking are presented in a form suitable for analysis of eye-movement patterns, e.g., in visual cognition studies, in attention measurement, or in the cursor control by the gaze. 5.1 Introduction and Terminology Once the coordinates of three points on the face are known, we can determine the sub- ject’s gaze direction. Several methods for gaze direction determination are discussed 89 in Section 2.4, that range from the exact mathematical solutions to an approximation using Artificial Neural Networks. In this work, the main goal is to avoid the calibra- tion of the system, and to avoid the explicit knowledge of the user’s dimensions or camera characteristics. This would enable high transparency and portability of the system. Currently, the program tracks only the user’s head and eyes and nose position within the face, unlike systems that track the user’s eye movements. The head pose is determined by the position of the eyes and nose points in 2D, and the user must turn the whole head for a gaze point to change in our system. In the eye-only-tracking systems, the user’s head must be stationary, and only the eyes would move. Thus, in our work we are not able to track the saccadic eye movements. The experiment setting is such that we do not have enough information about the eye area to determine the saccadic movements: they would translate into just 1—2 pixels change of the eye pupil position in the image plane. Figure 4.3(a) illustrates this point: in the first 100 frames of the movie (about 6 seconds) the user’s head was still. Due to the nature of human eye movements, the user’s eyes must have moved in that time interval. However, the resulting changes in the eye coordinates are only one or two pixels. In the following 200 frames (about 12 seconds), the subject moved the eyes. These moves resulted in about :l:5 pixels change in the eye pupil coordinate. To determine saccadic eye movement we must use a high resolution camera that would zoom in on one eye, and the lighting conditions must be carefully controlled. Alternatively, an infra-red light source could be used to track the precise eye move- 90 ments, as discussed in Section 2.4. In both cases, the user’s head must be stationary (or move just slightly), since the tracking systems are sensitive to the head motion. In the later text of this document, we will use the term “gaze direction” in the following context: A Gaze direction: The point in the display plane determined through a transformation of the 2D image plane coordinates of the eyes and tip of the nose. The transformation we use does not necessarily result in the point that coincides with the point that the user foveates. However, in the case when the user is in front of the display and explores it, the gaze point determined by our algorithm is a good approximation of the point which the user foveates. 5.2 Gaze Direction Determination Using ANN One intuitive solution to the gaze determination problem is to determine the left- right motion by measuring the distance between the eyes and nose in the horizontal direction. This is easy to achieve and provides accurate results. However, applying a similar logic to up-down motion does not produce good results, since the eyes-nose relations are similar for a face looking up and down. Another option is to fit the features’ image position data for various gaze directions into a curve or to classify them in some way. However, using pure image coordinates would require a vast amount of training data that would cover all gaze directions from all image positions. 91 \ face boundaries / Figure 5.1: Relations between the eyes and nose used as the input to the neural network to determine the gaze direction. Dotted lines show the relations we use as the ANN input, dashed lines show the relations that had high weight in the resulting ANN. Finally, after an analysis of the eyes and nose position within the face frame, it was concluded that some relations matter more than others, and that the gaze direction could be determined from the relative positions of the eyes and nose within a face. The solution to the function approximation was to use an Artificial Neural Network. The input to the network are the eyes and nose coordinates and their relations, and the output is normalized screen coordinates of where the subject is looking. Figure 5.1 depicts the eyes-nose relations used as the input to the network. All inputs to the ANN are normalized to values from 0.0 to 1.0. The relations used (depicted in dashed lines) are: e X and Y coordinates of eyes and nose (total of 6 inputs), 0 distance between two eyes, and each eye and nose in X and Y coordinates (total of 6 inputs), 0 distance of eyes and nose to X and Y face boundary (total of 12 inputs). 92 - |<[ * Isl WI r------r r------r Figure 5.2: Sample input data for ANN trained to determine gaze directionDashed lines Show the target grid points, rectangles represent face boundaries and triangles represent the corresponding eyes and nose locations. The ANN has a total of 24 inputs, one hidden layer, and 2 outputs (normalized X and Y screen coordinates). Different numbers of hidden units were tested, and the best results were achieved for 6 hidden units. Training samples were obtained by having users look at buttons in a 4 x 4 grid on the workstation monitor while their pictures were captured. For each captured image the coordinates of the face boundaries and found features were recorded along with the screen coordinate of where the subject was looking. Figure 5.2 shows the sample input triangles and face boundaries for the 4 x 4 grid. Then, the neural network was trained using the QuickProp program [27]. The results using only the neural network-based mapping were not satisfactory. The main reason was that no movement history was used, and thus, for small changes 93 in the input, the mapped screen coordinate changes could be quite significant. Fig- ure 5.3(a) shows the screen coordinates in time for a sample movie file. The subject was instructed to look from the upper left to the lower right corner in spiral moves. For each frame, the ANN was run to output the screen coordinates. As can be seen, the movements of the imaginary cursor are not smooth at all. The reason for this problem is that the ANN’s output could change significantly for small changes in the input data. 5.3 Gaze Direction Determination Using Move- ment Vector Scaling To improve the performance of the gaze tracking, a combined approach that takes advantage of smooth subject motion is used. After an initial estimate of the screen coordinates using the neural network, we scale the subject’s movements in the image into the display cursor movements. We can achieve smooth movements of the cursor as long as the subject’s head is moving smoothly. The logic is similar to that used for an ordinary mouse, where the display/ control gain factor is used. In the long run, this approach enables us to adapt to individual users needs, where each user would decide how much he or she wants to move the head in order to move the cursor. Also, an adaptable gain factor could be used so that for faster motion a greater gain value would be used, while for slower motion a smaller gain value would be used. 94 Screen coord. in time, movie file mv6, motion vect. based mapping 1 I : I ' I , I Raw coord. +— ’c‘ E 2 .8. 2 a .E ‘9 > O 1 l 1 1 L 0 0.2 0.4 0.6 0.8 1 X coordinate (row) (a) Gaze tracking using neural network based mapping only. Screen coord. in time. movie file mv6, motion vect. based mapping 1 I I I I Raw coord. +— 0.8” "::::::;::c:‘ :‘;. . " ’E 5 0.6 - .8. O E .E E i; z ; f . -. § 0.4" , 3 ’::2;'::::t:"" .7 >_ . ‘ ,'- 3 '. ‘ 0.2 r ‘3 i " . - o l I Lu: 3 C I: . ? 3 3 ' 5 1 7 i ' - O 0.2 0.4 0.6 0.8 1 x coordinate (row) (b) Gaze tracking using scaling of movements in picture to display movements. Figure 5.3: Comparison of gaze tracking results using the neural network only and using the scaling of movements in the picture to display movements. 95 This could enable fast coarse positioning and slower and smaller motion for finer positioning. These issues will be further discussed in Chapter 6. In the experiments described below, the display/control gain factor was deter- mined based on the screen resolution and the head movement range in the X and Y coordinates. For the screen resolution of 946 x 910 pixels, and the maximum expected head movement of 50 pixels in the X and 30 pixels in the Y direction, the gain factor was 18.88 in the X and 30.33 in the Y direction. The gain factor was constant during the experiment. To calculate the gaze point on the screen, the following rules are used: 0 if the current frame is the first frame with the face detected, the ANN is used to estimate gaze point; e if the gaze was estimated in the previous frame, the new gaze point is calculated as the previous gaze point plus the velocity vector calculated in the Kalman filter equations; 0 if the gaze point coordinates are out of range (e.g., in the case when the dis- play/control gain is not well adjusted to the subject’s movements), the out—of— range coordinate is set to the maximum or minimum coordinate; o if the tracked features are temporarily lost, the gaze point is kept at its previous location. 96 The above rules ensure that the gaze point is always in the display units range. A problem might arise if the gain factor is not appropriate, and that problem could be solved by individual calibration or on—line adjustment. Results of the gaze tracking for a 4 x 4 grid are depicted in Figure 5.3. The subject was instructed to look in spiral moves from the upper left corner to the lower left corner and to traverse all grid points. As can be seen in Figure 4.3(b), the X and Y coordinates of the tracked points are smooth in time. The graphs in Figure 5.3(a,b) show the X and Y display coordinates normalized to values between 0.0 and 1.0. The raw coordinates obtained by the mapping were normalized to the grid coordinates. Only the grid coordinates were shown to the subject. Using the neural network- based mapping gives very unstable screen coordinates, as shown in Figure 5.3(a). On the other hand, if the movement scaling is used, smooth movements of the screen coordinates are achieved, as shown in Figure 5.3(b). 5.4 Smoothing of Gaze Point The approach described in the previous Section will work well when the tracking is highly accurate, and when the user moves smoothly. However, in many practical cases these conditions will not be met. As a result, the detected gaze point will not be very stable and will look more like the output in Figure 5.3 (a). To overcome this problem, we use two levels of smoothing of the gaze point pro- duced by movement scaling. First, we estimate gaze point using the Kalman Filter. The input data to the filter are gaze coordinates produced by the movement scaling. 97 Then, a weighted averaging of estimated gaze point is done by using 90% of the pre- vious value and 10% of the new value. In this way, we ensure that all the changes in the gaze point will be smooth. Comparisons of the three different ways to determine gaze point are shown in Figure 5.4. Figure (a) compares the movement scaling and the use of the estimate from the Kalman filter. Clearly, the filtered output is much smoother. Figure (b) compares the estimate from the Kalman filter and weighted averaging. In this case the differences are a bit more subtle, and they are visible only when the user is still, and staring at one point. In that case, small vibrations of the head will be compensated by the weighted averaging. The results and applications of this method will be discussed in more detail in Chapter 7. 5.5 Results The algorithms described in the previous Sections can be applied in many ways. In this Section, it is described how the user’s gaze can be monitored and used for applications such as visual perception studies, and how the user can control the cursor by moving his/ her head. 5.5.1 Fixations Determination As discussed in Section 1.2.2, determining the precise gaze point is of great interest in visual perception studies. However, measurement equipment is intrusive and users are not observing the displays in a natural setting. In our work, we applied our 98 Gaze point determined by movement scaling 1200 I I I I I I Original — 1'3" - / -‘-—sv 2 4' ' c, ' ’ _ .g 4- -,’ '2 (a) move- § 7’ - ment scaling § 3 > _ l 0 200 400 600 800 1000 1200 1400 X screen coordinate Gaze point determined by KF estimation 1200 I I I I I ‘F ' KF estimate — 1000 P l ‘ I] I g. I 2 _ .. g 800 - -1 I 1 s, . g . (b) KF esti- 600 — f _ . C mation 33 ‘ 8 > 400 - ‘ ‘ d 200 h A «‘5’ ‘ 5"“ '27.?!) / o r l I l l 0 200 400 600 800 1 000 1 200 1 400 X screen coordinate Gaze point determined by weighted averaging 1200 l I I I I I Weighted averaging —— 1000 - 9 800 s .2 '2 (c) weighted § 500 _ . C averaging E 3.. 400 — 200 _ 0 O 200 400 600 800 1000 1 200 1 400 X screen coordinate Figure 5.4: Comparison of gaze point determined by movement scaling, Kalman filter estimation, and weighted averaging 99 MONITOR 11.4in (27 degrees) 1 Figure 5.5: Fixations measurement experiment setting and viewing angles gaze direction measurement method to measuring the fixation points while the users observed an image. The setting of the experiment consisted of a workstation monitor (19” diagonal) and users comfortably sat in a chair at about an arm’s length from the monitor (about 2 ft away) (see Figure 5.5). The image was displayed at 1200 x 800 pixels resolution and covered the whole screen. The span in the horizontal direction was about 40 degrees of viewing angle and in the vertical direction it was about 26 degrees. These angles are much more than 15 degrees, which is the limit that can be viewed by just eye movement, so that the users had to move their head a bit in order to view the whole displayed image. Thus, by measuring the gaze as proposed above, we could reconstruct the user’s viewing patterns. The output of the gaze direction detection algorithm is a sequence of the screen co- ordinates with a timestamp. The measurements are not output in constant intervals, and the time interval depends on the image analysis time in each frame, and ranges from 20 msec to 800 msec (when a log file is written). In addition, since the user is 100 not able to keep the head perfectly still, the gaze point. is not. perfectly still when the user’s head appears to be still, but might. oscillate slightly. To be able to calculate the fixation points, we apply the algorithm in Figure 5.6, which has another level of smoothing in it. Simply, results for each frame are accumulated for at least 100771360. The averaged gaze location is computed and Euclidean distance is calculated from the gaze location from the previous time interval. The computed distance determines whether the subject’s head is moving or whether the head is still. Fixation is defined as the period of time while the subject does not move significantly. In addition, the minimum and maximum value of the screen coordinates are found and all the values are scaled to be in the range 0.0 to 1.0. This is needed since no calibration is performed and the user might move the head for just a fraction of the full range, so it is important to re-scale the gaze point. The results of the gaze path and fixations for a sample image are shown in Figure 5.7. The subject. was shown the image for 30 seconds and was told to examine the image and remember as many details as possible. A total of 23 fixations were recorded, with durations ranging from 123 to 1690 milliseconds. These results are not equivalent to the results of the Purkinje eye-tracker, where on the same image the number of saccades was about 40 and subjects viewed the image for 15 seconds. In our setting, we are not recording the actual saccades, but rather the head movements during the exploration of an image. In one fixation, there iS probably a number of saccadic eye movement that might be recorded with a camera that would zoom in on the subject’s eye. 101 finput: on the frame level gaze coordinata Gin(x,y) normalized to 0.1 range and processing time Tin Constants: MIN_FIX_TIME lOOmsec, L FIX_THRESH 0.01 j 11 Add Gin,Tin to Gavg,Tavg ‘ no < Tavg >= MIN_FIX_TIME ># ll yes EDcur = EuclideanDist(Gavg,Gprev) 11 —y—els< EDcur <= FIX_THRESH >110— li i < EDprev <= FIX_THRESH >LJ< EDprev <= FIX_THRESH > (7 yes 9 yes Add Gavg,Tavg to Gprev,Tprev Output Gprev,Tprev 4i 1 Set Gprev,Tprev to Gavg.Tavg '1 Set EDprev to EDcur Set Gavg,Tavg to O Figure 5.6: Fixation determination algorithm 102 Photo from “50 Favorite Rooms by Frank Lloyd Wright” by Diane Madden: Figure 5.7: Gaze path and fixations for a sample image. White line indicates the gaze path. Gray dots are fixation locations, the dot size indicates fixation duration. Black line connects consecutive fixation locations. The complete results of the fixations determination experiment are presented in Appendix A. The experiment was conducted adjunct to the evaluation of the head- eye input device described in Chapter 7. A total of 18 subjects viewed three different images for 30 seconds each, and the gaze path and fixation locations were determined automatically using the algorithm in Figure 5.6. 5.5.2 Attention Measurement Another aspect where automatic gaze determination can be used is measuring user’s attention (for example to television, c0mputer,...). Currently used methods rely on 103 Camera for automatic analysis Camera for manual coding \ I COMPUTER1 TV 19" monitor I 19" monitor J ’1’ “ A [I r TABLE KEYB DARD I USER Figure 5.8: Attention measurement experiment setting and viewing angles the manual coding of video tapes that record the subject (e.g., while watching TV). These methods are tedious for the coders, and even with a highly automated system, the human coder gets tired quickly and is prone to erroneous coding. The gaze determination system we developed could be used in such an application and would be completely automatic. The only human intervention is to set-up the recording at the beginning of the experiment and the measurement can be done for an indefinite time. We conducted an experiment to verify the usability of our system in such an automatic measurement. This was a pilot project that would give us an insight in how well the automatic measurement would work. The experiment was conducted with Jay Newell, PhD student in the Telecommunication Department at Michigan State University [56]. The manual coding has been done in the MIND lab at the 104 Telecommunication Department at MSU by Jerry Roll, an intern in the lab, and Jay Newell. The subjects were asked to watch a video tape about 13 minutes long of a newscast and advertisements. Throughout the video a WWW address would be shown, either in the newscast part or as part of the advertisment. The subjects were instructed to enter the WWW address into the Netscape program whenever they would notice one on the TV. The Netscape program was already running on the computer. The first four minutes of the video tape contained mostly the newscast and in the nine remaining minutes contained both the advertisements and the newscast. The purpose of the task was to have the users look back and forth from the computer to the TV monitor. A total of 10 subjects participated in the experiment, two of them were the experimenters themselves. A black background was used to avoid any erroneous detection of faces in a pink carpeting that was in the room. Two subjects, who had glasses, had three blue squares (above eyebrows and on the tip of the nose), and the blue squares were tracked. That was necessary due to the current problems with tracking when a person wears glasses. We assumed the set-up of the experiment as depicted in Figure 5.8. Subjects were seated about 24” away from a computer monitor, and about 36” away from a TV monitor. The viewing angle between the TV and the computer was 60°. The algorithm used to analyze the data is similar to the one used for fixation determination (Figure 5.6) and is given in Figure 5.9. What is basically done is to check whether the subject moved left or right with respect to the last extrema point. The extrema point is calculated dynamically and in the horizontal direction only and depends on the 105 nput: on the frame level gaze coordinate Gin(x,y) normalized to O...1 range and processing time Ti Constants: MIN_FIX_TIME. FIX_THRESH, CHANGE_THR A A V V l [ Add Gin,Tin to Gavg,Tavg j l < Tavg >= MIN_FIX_TIME >104 yes [ EDcur = EuclideanDist(Gavg,Gprev) :l l JS< EDcur<= FIX_THRESH >—n°—— it it < EDprev <= FIX_THRESH > "O c n0< EDprev <= FIX_THRESH > l yes i yes Fixation still lasts Moving starts, fixation just ended Add Gavg,Tavg to Gprev,Tprev Output Gprev,Tprev fixation = Gprev if view_loc_prev == VIEW_TV : extrem_X = MIN(fixation_X, extrem_X) else extrem_X = MAX(fixation_X, extrem_X A V it Update viewing status if (Gavg_X-extrem_X) >= CHANGE_THR : current_view = VIEW_COMP else if (Gavg_X-extrem_X) >= -CHANGE_THR : current_view = VIEW_TV else : current_view = prev_view . i . yes < current_vrew == prev_vnew > no Output prev_view, current_time-start_time btart_time, prev_view, extrem_X = current_time, current_view. fixation_X Set Gprev,Tprev to Gavg,Tavg L -4 v‘ 17 Set EDprev to EDcur Set Gavg.Tavg to O Figure 5.9: Attention measurement calculation algorithm 106 subj_ 1 000. Gaze coordinate Y 1' V WV Y I Y 1 L- X left eye coord --—— _ .93 . 0.6 ..t 0.6 r- — a: a. 0-4 e . ~ x 0.2 — —1 o l L 1 l A l l O 1 00 200 300 400 500 600 700 800 Time (seconds) X left eye coordinate eubj__1000. Viewing status for automatic and manual coding: low-TV. high-computer 1 Y ' ' I ' Comp 1 _.__ _ ._. , ',- ....... -. e-. , H .—. H .. --. q‘ . l 3 -; fl -- ll: 1 1- : -: : ;: g . 15.1.“ tIl a f— ;.; ;,~ Hi; .- g ,1 ;-~ : i : :- g, 2- z: : 1? : I: . ;:- : ::::;: h: i : :lv .;: :::‘:: - 5.:1- -i: - . I ' I a I ’ I II : ' I ' ' 1 1. l I. . : u . H . . up ,, :1 3 p Ii .. .. ., u = i i , ll . . ‘ D I .. _ . I... .. .5. , ' : z | -. . . i . , ,, , . :i.v ., 'i' k I . . V 'n n I . I , I, , | '9': ,. '0' g . . . .: : z . I ' :z - . ~35! ': ::. ' ' _J . L «_~ - . 1.: :__.._.‘ ' ~ ' ' t- .. 2 Mill ll llllllllllliflllllllllllll l|l_._|JllilHiifllllUllll VIE’MM status TV 1 A 300 A l "I 00 200 300 400 50 Time (eeconde) 800: Automatic (top) and manual (bottom) viewing locations: low-TV, high-computer subj__1 001 . Gaze coordinate ' I 1 T I I I X gaze coord -— 22 “ ”' -* g 0.3 - — § 0, _ a $~ 3: 0.4 -— — .92 a c 0-2 -- d o 1 % A L 1 A A l O 1 00 200 300 400 500 600 700 800 900 Time (eeoonde) X gaze coordinate subj_1oo1 , Viewing status tor automatic and manual coding: low-TV. high-computer Y Y I r r I I 1 manual — ———-—-— —— -- —————-. _... - -——~ .——.—. ——. ___. -- .__. -_, U‘ 4. U0 "" _. Comp —;i I E: l 1:! ; I g r: : :g; H l i g :5 15:: g E If i : ll ; f T. “a 1‘3 == ‘ 5' ' ”5H i ii =a:a 5; i : : ~55; ' ii : s :2 z 5. I l! a as s as; 52:: ! :2 552 n; , ”.5: .2 s: 5 i :3: 3 i" g’ C T}; .55----5 55 =__-5:55_: E_-_5=.5 5:E___::__5 _____-_5 23555 :-.= :5 5”? 55 :__ _::___.= .- — om r ”flfl l'—_—‘| flfl 1r - > TN! _H W W ~ ’ CI- 1 l 1 I l L l l O 1 00 200 300 400 500 600 700 800 900 Time (seconds) 801: Automatic (top) and manual (bottom) viewing locations: low—TV, high-computer eub]_1004. Gaze coordinate I r V V I X left eye coord — a: " " g 0.8 b -‘ g 0.. w _ - w _ .5 W * - x 0.2 F- —4 o A A 4 A l A L O 1 00 200 300 400 500 600 700 600 Time (seconds) X left eye coordinate subj__1oo4, Viewing status for automatic and manual coding: low-TV. high-—computer I I V Y I I manual — Comp .— ——“______---__':, _________ r____- . = ‘ na_i ----- _. I: = . I I I l . _____ . Comp W memngsetus J l.‘.'.'.‘.'-- 0 .- o t - . I l l l I O 1 00 200 300 4 500 600 700 800 Time (seconds) 804: Automatic (tOp) and manual (bottom) viewing locations: low-TV, high—computer Figure 5.10: Gaze coordinate and viewing positions (TV vs. computer) over time for three subjects 800, $01, 304 107 current viewing location. For TV attention, it is the minimal point in that viewing interval, and for the computer attention, it. is the maximal point in that viewing interval. Figure 5.10 depicts the X gaze coordinate on the screen, left eye X image coor- dinate, and the viewing position determined automatically (upper line) and through manual coding (lower line) for the duration of the experiment. We can see that there is some agreement in the two codings and that in some occasions the two codings do not agree. The first two plots are for a subject who had three blue cosmetic dots that were tracked. That subject wears glasses and the tracking of the facial features could not be done very accurately with glasses on. Thus, blue dots were tracked. The later four plots are for the subjects whose facial features have been tracked. The constant value in the algorithm is the threshold used to determine whether the subject moved enough to signal that the viewing location has changed. The threshold was determined empirically and for some subjects it worked well, while for some it didn’t work well. In addition, we observed that for a number of subjects where there was almost no agreement in the viewing location, the input signal was the source of the problem. We then tested two input signals: gaze point on the screen, and raw image left eye coordinate. There would be a difference between the two signals, since the eye coordinate is used to determine gaze coordinate, and in the transformation process some information might be lost. To determine the magnitude of the change of the signal (x coordinate over time only), the histograms of the absolute value of the changes magnitude were plotted for both input signals. All the data points were scaled to the 0.0—1.0 interval and 108 siij 000. X coordinate changes magnitude histogram aubL 1004, X coordinate changes magnitude histogram Leh eye X coord. —V 50 v v 1 v V v V V 1 Left eye X coord. — 1 ‘0 l 40 30 {i 30 r '56 ol data '7. of data 10 15 20 25 30 35 40 45 50 Histogram bin (bin size 0.02) 0 5 10 15 20 25 30 35 40 45 50 Histogram bin (bin size 0 02) (a) Narrow histograms (b) Wide histograms Figure 5.11: Gaze and eye coordinate changes magnitudes histograms for two subjects bin size was 0.02, with a total of 50 bins. The histograms should have Gaussian shape with a peek at zero. If the changes magnitudes vary, the Gaussian curve will be wider. Figure 5.11 depicts two sets of histograms for the gaze and eye coordinats. The common patterns in the histograms were that both histograms were narrow (Figure 5.11(a)), one histogram was narrow and the other was wide, or both were wide (Figure 5.11(b)). The narrow histogram generally gives a better estimate of the viewing change threshold. Thus, we select input signal type based on the histograms: the one that has 70% or more of the data points in the first. five bins was selected Over the one that had less than 70% in the first five bins. If both histograms were narrow, the one that had 95% of the data in a smaller numbered bin was selected. If both histograms were wide, the one that was narrower was selected. Once the input type is selected, the viewing change threshold is selected based on the percentage of data points in the first five bins (N5). The bin up to which N5 + (100 — N5)/2 percent of the data lies is selected as a threshold. 109 Automatic selection Manual best-threshold sel. Kappa with Kappa with % tolerance % tolerance Sub j. Input thr. agree Os i1s Input thr. agree 03 :l: 1.9 1000 EYE 0.14 57.58 0.18 0.31 EYE 0.08 68.26 0.37 0.58 1001 SCR 0.28 57.80 0.15 0.26 SCR 0.30 60.28 0.20 0.30 1002 EYE 0.14 59.18 0.21 0.51 EYE 0.14 59.18 0.21 0.51 1003 EYE 0.14 62.08 0.26 0.44 EYE 0.32 76.96 0.52 0.55 1004 EYE 0.14 66.18 0.30 0.58 EYE 0.28 95.57 0.91 0.94 1005 SCR 0.24 66.36 0.11 0.14 SCR 0.28 69.67 0.12 0.16 1006 SCR 0.20 63.19 0.26 0.31 SCR 0.34 63.32 0.27 0.30 1007 EYE 0.14 64.80 0.30 0.37 EYE 0.22 74.56 0.49 0.52 1008 SCR 0.32 68.06 0.28 0.40 SCR 0.12 70.57 0.32 0.55 1009 EYE 0.14 47.31 -0.08 0.15 EYE 0.08 53.55 0.03 0.33 Avg. 61.25 0.20 0.35 69.19 0.34 0.47 Table 5.1: Percentage of agreement in viewing positions and Kappa statistics for all subjects We assumed that the manual coding is correct. We compared the percent- age of agreement of the two codings, in every millisecond, and the corresponding Kappa [76, 17] statistic both with no tolerance and with a i1 second tolerance. The Kappa statistic is commonly used to compare different coding schemes. Table 5.1 (left five columns) shows the input type and the threshold selected for each subject, the percentage of agreement and both Kappa statistics. For most subjects, the automatic coding was correct in at least 60% of the measurement interval. Kappa statistics with no tolerance were on average 0.2, and with the one second tolerance they were on average 0.35. For one subject, 1009, the automatic coding did not produce a satis- factory result. The right columns in Table 5.1 give the best manually selected results for each subject. The thresholds were selected for each subject independently by the author. For all but one subject, both the percentages of agreement and the Kappa scores are higher than in the case of the automatic selection. In the case of subject 110 Human coder Fully automatic Manual threshold sel. TV Comp. TV Comp. TV Comp. # time # time # time # time # time # time I Subj. ob. me. std ob. me. std ob. me. std ob. me. std ob. me. std ob. me. std 1000 42 8 2 43 10 2 22 14 5 22 9 2 40 9 2 40 7 1 1001 20 22 2 21 17 4 25 17 5 25 14 4 25 17 5 25 14 4 1002 42 10 1 42 8 2 76 4 1 76 6 1 76 4 1 76 6 1 1003 45 7 3 46 10 2 45 7 2 45 4 1 6 39 25 6 20 14 1004 15 22 3 16 26 5 62 4 1 63 7 2 6 52 40 7 16 10 1005 6 117 - 7 6 2 9 53 30 9 26 13 8 58 33 9 27 13 1006 43 9 1 44 8 2 11 23 9 11 15 7 8 3123 8 12 5 1007 41 9 2 42 8 2 16 18 7 17 12 5 5 50 39 5 5 2 1008 28 17 2 29 9 3 22 21 8 22 8 5 34 15 5 35 5 l 1009 45 10 2 46 8 2 44 9 2 44 4 1 62 7 1 62 3 1 Table 5.2: Number of observations, their mean and standard deviation in seconds for manual coding, and automatic coding 1004, the agreement with the human coder was almost perfect (the last two plots in Figure 5.10). Table 5.2 shows the number of observations of changes in the viewing locations and mean durations for the human coder, fully automatic selection and automatic selection with manual threshold setting. Both the number of observations and the mean duration of the attention interval in the case of automatic coding differs signif- icantly in many cases compared to the manual coding results. This means that the automatic coding either generates too many or too few viewing location changes and that it misses some quick and short glances to either TV or computer monitor. The above results show that the method could be adapted and improved. Firstly, the tracking of facial features must be very accurate. In our case, for some subjects the tracking failed to detect the correct features and detected erroneous features. As a Consequence, the automatic detection would detect more changes in the viewing lOcation. In the case of subjects 1000 and 1005, the tracking worked reasonably well, 111 since blue dots were tracked instead of the facial features. For subject 1000, the achieved percentage of agreement and Kappa scores were high, and the number of observations and their mean duration were comparable. Subject 1005 was mainly looking at the TV monitor and made just a few glances to the computer. Thus, in his case, it was not easy to determine the appropriate thresholds through the histograms. The automatic coding only error was that it detected that the subject looked at the computer toward the end of the recording and never switched towards the TV, which resulted in an error. Secondly, the position of the camera should be carefully selected. One solution would be to position the camera so that the subjects have to turn their head com- pletely to view the TV, and in that case the face would not be visible at all. An easy attention determination would then be to simply check whether the face is visible or not. The problem in that set-up is that, in some cases, peOple make just a glance to a. TV and the face is still visible. In addition, such a situation does not resemble the real-world set-up, where the people often have the computer and TV arranged as in our experiment, and glances quick and small in magnitude are frequent. 5. 5 .3 Cursor Movement In the previous section, a passive observance system was described. The user has no feedback in terms of where the system thinks the user is looking. If the user is given that feedback and sees the actual gaze point on the screen, he or she can respond and intEract with the system. The user can willingly move the head to produce cursor 112 motion. As the system gives the feedback on the cursor position the user can learn to move in the right way so that the cursor can be moved along the desired path. No calibration is needed at the beginning of the usage, rather, the system cal- ibrates itself while it is being used. The Kalman filter estimation will pick up the motion trend as the user moves. If the tracking is temporarily lost, or if the user tilts too much, the re—calibration procedure is rather simple: the user should just look in the left-right, top-down, or vice versa, direction, and the filter parameters will get re—adjusted and the user will be able to control the cursor again. In practical cases, novice users have some difficulty using the system in the first few minutes until they figure out what needs to be done. When they familiarize themselves with the system, they are able to move the cursor rather well. They are also able learn when to perform the re—calibration procedure and resume cursor control. That means that calibration is not the key issue in the system. The key issue is the concept of the system: the user learns how to move the head in order to control the cursor. Figure 5.12 displays sample results of the cursor control. Figure 5.12 (a) shows a spiral path along an 8 X 8 grid: the subject was instructed to look from the upper left to the upper right corner in a spiral path and to attempt to traverse all the grid points. The solid line shows the cursor path and the dashed line shows selected buttons (equivalent to the solid line normalized to the grid points). The path was rather straight, and there were a few out-of-path moves that were corrected. Figure 5.12 (b) shows a cursor path during a net-surfing session. The snapshot of the screen shows the Netscape window and the black line shows the cursor path. The subject was selecting 113 Screen coord. In time. movie file mv6, motion vect. based mapping “"‘“ =I 0.875 0.625 0.375 Y coordinale (column) 0 (II 0.25 0.125 4 — 1 1 0.625 0.75 0.375 1 Xooordiiigturow) (a) Interactive data and 8 X 8 grid, subject looked 1n spiral moves. Solid line shows the cursor path, dashed line shows the selected grid buttons. . ..., .._ «..., l .:. _ ’ ' K. i N» i [In Mm thr.:l'u'. r'ni I. ”it-"I II- 91-! I' i M l Harmer-n. a; I.A- “Ola. - i—I-v | I-- l- I -II II- s :ith.) (a) Net surfing using the head— —eye control. Black line shOws the cursor path. [Image is presented' In color.] Figure 5.12: Sample results of cursor control using gaze tracking 114 “Back”, “Forward”, “Stop” buttons and search engines just right of the “Snapshot” icon. Results of an evaluation of the above method will be presented in Chapter 7. 5.6 Summary In this Chapter, a method for gaze direction detection was presented. The method is based on the use of an Artificial Neural Network to determine the initial gaze di- rection and the display vs. image coordinates gain factor. In this way, movements of the tracked feature coordinates are scaled into display cursor movements. A method for smoothing of the gaze path based on the Kalman filter estimation and weighted averaging was presented. The described algorithms were applied to a fixations mea- surement and results were presented. Results are presented of a pilot study that compared the automatic attention measurement system based on gaze tracking with manual coding. It was described how the gaze direction could be used to control the cursor motion and preliminary results for an 8 x 8 grid and free-form button arrangement were presented. It was also shown that the cursor can be moved along a predefined path using the head-eye input interface, e.g., spiral moves along all grid points, or to a specific point on the screen. 115 Chapter 6 HCI Based on the Head-Eye Input When we interact with a computer, we need to watch the display. In most state- Of-the-art computers pointing on the screen with a mouse is a must for most of the tasks. While we are moving the mouse with our hand, we need to observe the results On the screen with our eyes. We have to learn how to move the mouse and then we need to learn to coordinate the mouse movements with our eye movements. This hand-eye coordination, however, is a barrier for some people, and some peeple just Cannot learn how to move the mouse well. If we were able to capture the gaze and enable the user to effortlessly point with head and eyes, we will be able to overcome the hand-eye coordination problem and will provide a more natural way to control a Computer. Another important issue is the level of intrusiveness of the gaze tracking equip- ment. Many systems are available that are very intrusive and do not allow natural motion, as discussed in Chapter 2. Other systems are not intrusive but require spe- Cialized and/ or expensive hardware or require some make up that will be tracked. 116 The system described in Chapter 5 offers a relatively inexpensive and easy-to—use alternative to other eye-tracking systems. This Chapter describes how a human- computer interface for the head-eye input interface has been developed. Issues such as selection mechanisms, appropriate button sizes and feedback level are discussed and suggestions are made on what is applicable. The discussion is based on the observations made while various users used the proposed head-eye input interface for typical computer interaction tasks. 6.1 Selection Mechanism With hardware pointing devices, selection is typically made by depressing some but- ton or key that is attached to the device. In a handless interface, such a selection mechanism would not be appropriate, since it would involve the use of hands. Many eye—tracking systems opted to use a “dwell button” selection mechanism: a button would get selected if the user fixates it for some predefined time interval. If the du- ration of the interval is small, problems like “Midas Touch” would arise, as discussed in Section 2.6.7. If the duration of the interval is long, the user would easily get tired while using the system. One natural solution is to use the face as a selection mechanism. Either facial expression or certain head movement could be used as a signal to select a button. Us- ing predefined head movements is similar to the “on-screen selection button” concept discussed in [86] and 2.6.5. In [86], Ware and Mikaelian showed that an on—screen selection button had the worst performance. To make a selection, the user would 117 need to fixate the desired target button for a predefined period and then to fixate the selection button for another period of time, and finally, the desired target button xvould be selected. In the eye-tracking system they tested, the head was still during the experiment. 6.1.1 Selection by Head Motion If we are to apply such a logic to our proposed head-eye input, the user would need to double the amount of head motion for each selection, which might be uncomfortable. Some other head motion (e.g., move head up-down for yes, or left-right for no) can be an alternative selection mechanism. This kind of selection would be rather long, and very confusing, and could lead to the Midas Touch problem. What if the user wants to just look somewhere up or down, and not to make a selection? How would the system distinguish that from the real selection? One option is to have a large number of false alarms, which would lead to the user’s dissatisfaction with the system. Another option is to have a custom-designed application that would incorporate such moves in the user interface. We designed an experiment where the users would fixate one screen button, then they would get some feedback from the system on what the system thinks they fixated. Then, they would reply with a yes or no signal, which was nodding or shaking of the head, The user had no feedback on the program-calculated cursor position. Figure 6.1 illustrates the state diagram for yes/ no signals: NO, SO, WE, EA stand for moving North, South, West, East, respectively. The motion is checked at every processed 118 l..EGEND b . NO - moving North 0 SO - tuning South EA - mowing East W'E - nnving West @ mu Figure 6.1: Selection by head motion—state diagram for “yes” and “no” motion detection frame. As an underlying GUI for the experiment, we used a card game: the user would fixate one of 15 cards (arranged in 3 rows and 5 columns), and when the program detects that the user fixated a card the program would change the card that it thinks the user fixated. If the change was correct, the user would reply with “yes” motion, and if the change was incorrect, the user would reply with “no” motion. If the user doesn’t notice the change in a card, the program would start flickering the Changed card after a timeout period. The order in which the user viewed the cards is depicted in Figure 6.2. The paths ‘30 the selected location (dashed line), and yes or no response paths (solid line) are depicted in Figure 6.3. In some cases the user had to move a lot until the system recOgnized the motion, and in some cases the response was not detected at all. Out Of 15 selections, user’s response has been correctly recognized 10 times, “no” response Was not recognized in 3 cases, and in 2 cases the user didn’t notice any change and didn’t attempt any response. 119 v screen coordnate V screen coordhaie Figure 6.2: Card 4, YES mum delecled pam to target ~— user yes/no response 0.66 ,3» l 1 r" [ 1 A 1 , 0.33 . . ' . I l I . [ \ o a/ l 0,4 0.6 0.8 X screen coordnale Card 1, NO monon _noL domed 1 STAR T' 0th lo target ———~, user yes/no response if 1. TARGEY I 1 l 1 l 0.66 5 . l A; «fr/"FEEECTION 1 X screen coordnate Y SCYOO" coordinate Predefined order of viewing the cards Cara H. No motion detected TARGET 11 ‘ pain to target . ~ user yes/no dispense 0.8 . 0.6 X 50188" CWdlnalfl Card 3. No cam change detected a oath to target —- a, user yes/no response a. a, [ i ’7 ‘ TAR a SELECTION 3‘ GET 3 g 0.66 I Q I U c l 8 1 $1 I > 0.33 l . 1 i ‘ l 0 I 0 0 2 0 4 0 6 0 B X screen coorifinaie Figure 6.3: Paths to selected locations and YES / NO response paths 120 Preliminary results of this experiment showed that it would be possible to design an interface using motion response. However, this type of selection would not be suitable in a real-world application due to long selection times. 6.1.2 Selection by Facial Expression As discussed in the Section 6.1.1, head motion would not be suitable for selection. The remaining option is making a facial expression as a selection mechanism. This should be easy to do for the user; after fixating a desired button, the user would make the required facial expression and thus make a selection. The expression to be used must be easy to do and should not involve unnatural movements. In this work, an “open mouth” expression is used as the selection signal. The reason we use a mouth expression instead of, say, eye blinking, is that mouth move- ments are done voluntarily. Eye blinking is not always done consciously: humans blink frequently to moisten their eyes. Thus, that is not the best way to control the computer. To determine whether the mouth is opened or not, we look for a big, dark ellip- soidal blob just below the nose. The best thresholded image from Figure 3.11 is used to find the mouth. Figure 6.4 shows examples of blobs for several face expressions: neutral, smiling (with mouth closed), open smiling (with mouth opened), and open mouth. For these four basic expressions, it can be clearly seen that the open mouth €Xpression is distinct from the other three because of the dark blob. 121 Neutral Smiling Open SmileOpen Mouth Original images 4 \“l .J ‘9' I; I" . - " i i‘ ' i "‘ ' Thresholded images (Pictures are part of the author’s private database.) Figure 6.4: Sample mouth expression Neutral Smiling Open SmileOpen Mouth t Original images Thresholded images (Pictures are part of the author’s private database.) Figure 6.5: Mouth expressions for a dark-skin person 122 still. mouth opened still. dwell > limit. mouth opened . , moving. mouth opened ‘ ACI'ION_PRESS mouth closed mouth closed Figure 6.6: Mouth states transition diagram The selection is done only when the subject is still for a predefined time. “Being still” is defined by the difference between the previous and current position of the gaze point being not more than 1% of the total screen width and height. Figure 6.6 depicts the transition diagram for mouth states. When a still state is discovered, it is checked for the state of the mouth. If the mouth is open, the transition to MOUTHJCTIONIRESS is made. If the subject starts moving with the open mouth, we define that state as MOUTHJCTIONDRAG. Finally, when the mouth is closed, we return to the initial state MOUTH.ACTION-NONE. An alternative way to check for the open mouth state would be to learn the intensities of the mouth blob for each user. For example, in the case of dark-skin persons, the mouth blob was lighter than the skin area when they open the mouth in our experiment. Thus, in that particular case, we need to look for an above-threshold area. Figure 6.5 illustrates four facial expressions in the case of a dark-skin person. 6.1.3 Other Ways to Make a Selection Sections 6.1.1 and 6.1.2 discussed two ways of making a selection that would involve the interaction using the head only. However, for some users that might not be 123 possible or practical. Alternative selection mechanisms are hardware button, voice commands, or tongue-keyboard [5]. The users would then point the cursor with their head, then depress some hardware button (e.g., a key on the keyboard), or say a 3 command (e.g., “yes’, “no”, “go”,...). 6.2 GUI Design Issues Figure 6.7 shows the GUI of the button selection program. In the upper left corner is the subject’s image as recorded by the camera. The tracked feature points are highlighted with the red crosses. Right below it we show a thresholded image of the mouth region that is used to recognize the mouth state. On the right is a grid where the program indicates gaze direction and selected button. A dark point indicates screen coordinates where the head-eye input interface is pointing, and when a selection is made, its color is changed to blue. If the users are to use this kind of interface, what is the right way to display the information and what is the level of feedback that should be given to the user? When someone is controlling an application (e.g., Netscape), they do not need to know whether the program finds their face or not. However, in the learning phase or during the troubleshooting, they need to have the feedback. 6.2.1 Level of Feedback In the setting of Figure 6.7, the user has full feedback on whether the tracked features are found and what the mouth state is as well as where the gaze point is. If there is 124 [Image is presented in color.] Figure 6.7: Snapshot of the face tracking demo GUI. Red dot shows where on the screen the user is looking 125 some error in tracking, e.g., the user tilts left or right and the tracked features are not found any more, the user can notice that on the display and initiate the re-calibration procedure (Section 5.5.3) or move in the more appropriate position. Also, the user can learn how to open the mouth so that the open mouth state is detected. In our experiments we designed a GUI that had no feedback, except that the selected grid point was highlighted. The GUI is similar to the one displayed in Figure 6.7, except that the camera images and other items on the left are not shown and the grid points are expanded on the whole screen. The users were able to use both GUIs without any notable difference. While the task (e.g., selecting buttons as guided by the program) was performed, the users didn’t pay attention to the feedback window at all. The only times they would need the feedback window was when the tracked features would be lost (e.g., the user would either tilt too much or would move in the chair so that the face would not be fully visible). Then the feedback was essential for fast recovery. The presence of full feedback is also useful for novice users. While they are not used to the interface, having the full feedback helps them learn how to perform best (e.g., how to open the mouth, and how much to move). 6.2.2 Button Sizes What is the appropriate button size that can be selected by the head-eye input interface‘? In terms of the updating accuracy of the cursor coordinates, we can update up to one pixel value. The users just need to learn to move very slowly to achieve 126 that kind of update. We experimented with several different button arrangements: in an 8 x 8 grid, two 100 x 100 pixel icons, and a Netscape browser and mail windows. In all experiments, the setting was the same as shown in Figure 5.5. The display diameter was 19”, or 15.2” x 11.4” with the resolution of 1266 x 1010 pixels. In the case of the 8 x 8 grid, the grid button size was 2.075” x 1.425” inches (in horizontal vs. vertical direction), or 5° x 3.250 of viewing angle. The size of a 100 x 100 pixels icon was 1.2” x 1.16”, or 2.750 x 2.750 of viewing angle. In the latter case, we could arrange 12 x 10 = 120 icons on the display, which gives a lot of room for various application needs. These large-sized selection areas would be suitable for custom-designed applications, e.g., a text editor or text display where a few buttons are needed and the remaining area would just display the text. However, most application programs have a large number of variable-sized buttons. Netscape is a typical example of such an application. Figure 6.8 shows a typical Netscape browser and mail window. The buttons and text entry fields that have been used in the experiments are highlighted. A typical news WWW page is also included and the sizes of its menu items and story titles are highlighted. Figure 6.9 gives the Netscape button sizes relative to the display size. Table 13.1 in Appendix E gives the sizes of the buttons in pixels, inches and degrees of Viewing angle. As can be seen, the button sizes vary greatly. The navigation buttons in the Netscape window itself could be configured to be larger if needed. However, the buttons on a specific WWW site cannot be re-configured. Most of the state-of- the-art WWW sites often have a menu of options and each menu button is rather Small in size. For a sample WWW site we used, the buttons are only 2.50 x 0.5” of 127 o LmlwdmavuZyt-n o «yum I I” I“ in o Latch-rd hm uh ma o Hovhz‘t alluded HaWd a WrZIPaI-Io'tloeumenl-Mmr MIIR' .. w M... v“ "' ‘t‘. “"2"" In: to l‘x... ..- ;- an..." [1 , - o Zchod-cntry t..:.........».....u....mms-.a.... wu-—.-——n~.«: Dan-1‘ lwlhfllM-‘lwnml .u— unncumm—a M m-mnmiminuowvnunnpn. can-nu _lMy H- 56. You Took W _ .. _ _ ‘ WEI? "G-'“.Je¢ao en a.“ Mares-lieu wager ...WW "II: Elf : 1 l ' .t-‘ , W'M “ ..W . '9 WW ’ « ' l ‘ ', 'l’eg'ldd AM. \ Tutu-tym- Bflofl but “all bmn [Image is presented in color.] Figure 6.8: Netscape browser and mail windows with selected buttons highlighted 128 close buttons panel 0-3XI-3 deg back/send buttons l.4xl.3 deg 15.2"l35 0 <- ----- - - - ------------------------ > A [ f I ‘ E [5 [£1 [5 F URL entry l6.3x0.7 deg I l l 0‘ URL arrow 0.4x0.7 deg : < Netscape browser 27.5x24.3 deg F : E [:3 [I Scroll bar 0.4x2.7 deg 9‘ I r———] ‘l l) a: I A A I - <— Display 35x27 deg l L 4 GO button 0.8x0.8 deg ZIP code entry 2.2x0.8 deg Story title 2.4x0.2 deg menu buttons 2.5x0.5 deg Figure 6.9: Netscape button sizes relative to the display size 129 viewing angle arranged in an array with no spacing in between. Thus, if the users are to successfully use the head-eye input interface, the accuracy of the system must be up to a few degrees of viewing angle. If we are to use this interface on a different display size, the pixels and inches values would differ based on the display size. In case of a smaller display (e.g., 14” diameter or less monitors) the user would, generally, sit closer to the display. Thus, for small values of the viewing angles, the pixel count would decrease significantly comparing to the medium size display (e.g., 19”). As the result, the selection areas would need to be larger, in order to achieve a reasonable accuracy. On the other hand, in case of a large size display (e.g., Immersadesk of 4 x 5 ft or 77” diameter), the small viewing angles would result in a larger pixel count than in the case of the medium size display. This would result in the possibility to achieve a higher selectable resolution on a large size display. 6.3 Advantages and Disadvantages of the Head- Eye Input Interface Each new input interface must have some advantages over the existing ones to be widely accepted and used. In the case of the mouse, it offered an easy pointing mechanism, and was widely accepted. Devices like the trackball are widely used in the notebook-style computers because of their size. Touch-screens are a widely used and convenient device for many public information systems. 130 [[ Head-eye [ Eye-tracking I Mouse [ Trackball [ Touch-screen Needs. ”and'eye C00” No No Yes Yes No dination Needs and occupies No No Yes Yes Yes hands Tolerates head move- Yes No Yes Yes Yes ments Display resolution Medium Medium Fine Fine Coarse For small size displays ‘ (ATM machines) Yes Yes Yes Yes Yes For medium size dis- plays (Workstatations) Yes Yes Yes Yes Yes For large Size displays Yes No No No No (Immersadesk) Selection times [I 3 sec 1 sec [ 1 sec [ 1 sec [ 1 sec Table 6.1: Features and requirements for the head-eye input, eye-tracking based input, mouse, trackball and touch-screen On the other hand, eye-based input has been used mainly by handicapped users, this interface is not widely used. often as their only way to control the computer. Due to constraints on head movement and no easy way to make selections, this interface has not been widely accepted. Head- based input has been used in some military applications and by handicapped users. Usually, a device is mounted on the head and tracked. Similar to the eye-based input, The head- and eye-based input interfaces however offer a more natural interaction With the computer. Table 6.1 gives some features and requirements for the head-eye input interface, eye-tracking based input, mouse, trackball and touch-screen. In all hand-based interfaces, the users must check with their eyes the results of the input deVice movements. Thus, hand—eye coordination must be learned. For some users (GB-g, elderly and children) this is not easy to achieve. All the hand-based interfaces, 131 naturally, occupy the users hands. In tasks such as blind typing, if the user wishes to select another window, he/she must first move one hand off the keyboard, then position it on the mouse, trackball or screen, make the selection, and finally put it back on the keyboard and resume typing. Even for experienced users, this task can cause a number of errors and slow-downs. Errors such as missing the mouse position or wrong re—positioning of the hand on the keyboard are common. In terms of head motion, no hand-based devices constrain it. A head-eye input interface does not require the head to be still, the user must move the head in order to control the cursor. In the case of eye-based input, in most systems the user’s head must be stationary or some small head motion is allowed. In terms of the display resolution and size, head-eye and eye-tracking input can be used for fine resolution selections, but require some skill from the user, and are prone to errors. This is due to the nature of our head and eye movements: it is very hard for the user to keep the head perfectly still and it is impossible to totally control eye movements. A touch-screen cannot be used for fine resolution selections. When used on small size displays (e.g., ATM machines), a head-eye input interface would be suitable since the number of options on such screens is not high and the buttons are typically large. In terms of medium size displays (e.g., workstation monitors), all interfaces are suitable. However, in the case of a large display such as the Immer- Sadesk, “conventional” input devices like the mouse, trackball, or touch-screen are not Suitable. Due to the screen size the user would need to move the head significantly in Order to locate the cursor, or would need to move the whole body and hands to reach the buttons on the display. Eye-based input, similarly, would not be suitable in such 132 an environment: the user would need to stand far from the display in order to be able to have the whole display in the 150 viewing range. In such an application, head-eye based input is a natural solution: the users need to move their head naturally to view various points on the display. Finally, in terms of selection times, when measured on the conventional worksta- tion monitor, the selection times for eye-input, mouse, trackball and touch-screen are about 1 second per selection. In the case of the head-eye input interface, the selection time is about 3 seconds per selection. This is significantly higher than for other input devices. Even the expert user (e.g., the author of the interface) was not able to make selections faster. We must note, however, that the 3 seconds time includes the time needed for the user to position to the target location, dwell time before the mouth state is checked, and the time needed to open the mouth by the user. If the users would be making selections by, for example, depressing a hardware button, the selection time would decrease. In some preliminary experiments, the selection times were about 2 seconds. 6.4 Summary In this Chapter, it was described how a gaze tracking system was integrated into a human-computer interface. Several selection mechanisms were discussed: response by nodding and shaking the head, and selection based on opening the mouth. GUI design issues such as level of feedback and button sizes were discussed and preliminary results were presented. 133 Chapter 7 Evaluation of the Head-Eye Input Interface Eye-tracking systems have been used in many ways. Mainly, their use has been limited to handicapped users and as a research tool in visual perception studies. Recently, the development of handless interfaces is gaining momentum and new head and eye tracking systems are being developed for the wide range of users. In Spring 1998 the author conducted market research for an independent study course. The research was done with Kang Hun Lee, an MBA student. In the course of the study, two focus group interviews were conducted and the partici- pants suggested the following applications: “mouse/trackball/touchscreen replace- ment, aid/train/monitor handicapped persons, control head moves on an avatar in VB applications, as an advertisement aid, for choosing TV channels, as a teleconfer- encing aid, in focus of attention measurement, in a virtual museum, for monitoring high-risk working environments, as an identification system aid, for opening the door 134 when one’s hands are full, for typing using gaze direction, for screen savers, to count the number of people in a crowd, in hospitals to aid/monitor patients, for an au- tomatic door, for security in banks to aid clerks during a robbery, as an aid for children to use computers, in car theft prevention, in opthalmology to fit glasses, to zoom the screen, and for selecting from a menu in a fast-food restaurant”. As our participants suggested, applications are numerous, ranging from support systems for the disabled, through replacement for a pointing device, to the aid in cognition and market research. In order to assess the usability, performance and possibilities of the head-eye input interface, we conducted a comprehensive evaluation of the interface. In this final Chapter the results are presented. In the evaluation we attempted to incorporate most of the tasks needed for typical applications. With these results we hope to shed more light on the requirements of a head-eye input interface, as well as on the pace of learning to use a novel interface. 7 .1 The Goals of the Study In order to verify the assumption that subjects improve performance with the head- eye input interface and to assess the range of tasks that can be performed we designed an experiment with the following objectives: 0 Introduce novice user to the interface and let him / her learn basics of the usage. 0 Perform a set of common computer tasks and assess the performance levels. The tasks should be arranged in increasing level of complexity. The tasks that are 135 performed first should serve more as the learning tasks. The final task should incorporate the learned material from the previous tasks. 0 Repeat the sessions after some time interval and compare the performance. The number of repeats is arbitrary and depends on the circumstances of the experiments. The more the users repeat, the more skilled they should be. In the repeated session, the subjects would perform exactly the same tasks and under the same conditions. That way, the only variable from session to session is the subjects and we can assess a subject’s learning pace. In our case we had to constrain the session time to one hour and we were able to have only one repeated session. These constraints were due to subjects’ availability. Thus, the range of tasks had to be narrowed to be able to fit everything in the prescribed time interval. 7 .2 Subject Pool Eighteen subjects gave informed consent for participating and were paid $10 for each session. The subjects of various complexions participated and all of them repeated the experiment. Table 7.1 details variations in complexions of the faces. As can be seen, many variations in the human population were presented in the subject pool. Most of the subjects were students or professors at Michigan State University, and two of the subjects were 10-year old children. A number of subjects normally wore eyeglasses and they had to take them off during the first session. During the second 136 Race: Caucasian - 9, Asian - 6, Indian / African - 3 Hair color/style: black - 10, brown - 4, blonde 2, bald - 2 Gender: male - 14, female - 4 Glasses: no - 12, yes - 6 Table 7.1: Subject pool: details of the variations in facial complexities session, 4 out of 6 subjects with glasses and 2 children had blue rectangular spots that were tracked instead of the facial features. The set-up of the experiment was the same for all the subjects: we used a black background to avoid any erroneous detection of faces in the pink carpeting we had in the room. For one dark-skin subject we used an additional light source that enabled successful tracking. Along with the results of our subjects, the results for the author herself are pre- sented. The author did not participate in the study since she can be considered an experienced and expert user. The author’s results (as subject 20) are presented just for reference and are not included in any of the calculated statistics. 7 .3 Evaluation Procedure In each session, subjects performed five tasks. In the later text, we will use names task T1, T 2, T3, T4 and T5. The tasks were the following: T1: Subjects had to move the cursor along a curve for 60 seconds. The video image with highlighted tracked feature point locations was displayed in the upper left 137 corner of the display. The curve spanned the whole display area. A total of three curves were used: 1? = cos(t) ocos(t), y = 0.5 + 0.2 - sin(4t) (shape of figure 8) :1: = 0.5 + 0.2 - sin(4t) y = cos(t) ecos(t) (shape of figure 00) :I:=0.5+0.5-t, y = 0.5 + 0.5 - cos(4 - ~7r) (wave form spanning from the left to the firth edge of the display) 0 curve 0 { t: [—7r...7r] 0 curve 1 { ’ :2 [—7r...7r] 0 curve 2 { t: [0...1] The curves were presented to the subjects in the following order: curve 0, curve 1, curve 2. The measure of the performance was squared error of each cursor location with respect to the closest curve point averaged over the number of cursor update points. T2: Guided button selection on an 8 x 8 grid. In each trial, the target positions were randomly selected and there were 10 targets. The random number generator had a unique seed value for each trial, thus, all the subjects repeated the same sequence of the target buttons. A target button would be highlighted in the blue color by the program and the subject would have to align the red dot that represented the selected grid button with the blue target grid button. When the points were aligned, the subject would open the mouth to make the final selection. If a button was not selected after three attempts, the next target button would be introduced. The GUI consisted of the grid buttons arranged to cover the whole display area and the only feedback was on the tracking 138 success was the selected grid point. Three trials were done. After the head-eye input control and selection, the subjects performed the same tasks using the conventional mouse so that the results obtained with the two input interfaces could be compared. The measure of the performance was the number of buttons correctly selected, and in which attempt, as well as the time needed for each selection. The time was measured with respect to the distance between buttons, by Fitts’ law (see Section 2.6.4). The conventional mouse data was plotted as one linear regression line, the data for the first trial as the second line, and the data for second and third trials were plotted as the third line. The reason for this latter breakup is to compare the performance in the initial (training) trial, and in the two later (testing) trials. Since we assume that all the subjects are experienced mouse users, we analyze all the three mouse trials together. In addition, the cursor path for each button selection was plotted. T3: Guided dragging of 100 x 100 pixel icons. The initial and target positions were randomly selected, and there were 10 draggings in one trial. The random num- bers generator had a unique seed value for each trial, thus, all the subjects repeated the same sequence of the target location pairs. The initial position was labeled with a basketball image, and the target position was labeled with a basket image. The subject had to correctly place each basketball icon, and only then the new icon position would be displayed. The selection was done by opening the mouth, and the mouth was kept open while the icon was dragged 139 to the destination. When the mouth would be closed, the icon was released. There were three trials as in the task T2. Upon completing the task with head-eye input, the subject would repeat it using the conventional mouse. The measure of the performance was the same as for the task T2, with the addition of measuring the number of steps needed to move the icon from the initial to the target position, where a step is the number of select-move-release tasks a subject performed. T4: Automatic determination of the gaze path and fixation points. The users exam- ined three pictures for 30 seconds each. The results of this task were presented in Section 5.5.1 and are given in full in Appendix A. T5: Surfing the internet and writing an email message using head-eye input for mov- ing the cursor and selection, and keyboard for typing. All the subjects per- formed the same task that included typical browsing and writing tasks. The Netscape program was used. The layout of the windows with selected buttons is depicted in Figure 6.8. Initially, the Netscape window was centered on the display and the mail program was minimized and its icon was located just left of the upper left corner of the Netscape window. The subjects did the following tasks: 1. Click on the URL entry field, using backspace/ delete keys delete the cur- rent contents, type in “www.msnbc.com”, and hit return key. The news page would load. 140 . Click on the headlines menu button (the last button among the menu Options). The intermediate page with an advertisement would load and the subject would need to click on the sentence “Click here to go to headlines”, or simply wait for a few seconds, and the headlines page would load. If the subject would click on the advertisement image, s/ he would need to click to the “Back” button to return to the headlines page. . Scroll to the very bottom of the page by clicking on the area just above the scroll arrow. The number of clicks was about 5, depending on the amount of information in the headlines page. . Click on the first weather story (located at the bottom of the headlines page). The page with the menu options on the left and the weather story on the right would load. . Click on the weather menu button. There was the advertisement page also, as for the headlines button. . Click on the ZIP code entry field, enter local ZIP code using the keyboard, and click on the “Go” button located just right of the ZIP code entry area. Weather report would load. . Click on the mail icon. The icon would open up in the upper left corner of the display. . Click on the “New msg” button. . Type in an email message to the author about the weather report. The typing tasks included clicking on the “To:” field to enter the author’s 141 email address, clicking on the “Subject:” field to enter the subject of the message, and clicking on the text entry area and typing the body of the message. 10. Click on the “Send” button. The experiment procedure was the following: o The experiments were conducted in the PRIP lab in the Computer Science and Engineering Department at MSU. The experiment set-up consisted of: an SGI Octane workstation with a camera mounted on the top of the monitor. The subject would sit in a chair without. any constraints as if s/he were working at the computer in the usual circumstances. The procedure was completely non-invasive. The experimenter was the author of the program. 0 During the experiments the video data was not recorded, but processed by the program and discarded. The only data saved were the coordinates of the tracked points on the face, screen pointer coordinates, and statistics relevant to the experiment (e.g., time to select a button, whether correct button was selected, etc). The session of task T5 (Net-surfing) was taped (the display and the subject’s side profile) so that the task could be later evaluated. 0 The training session consisted of a general demo of the system by the experi- menter, then the subject tried the system (s/he would see the picture from a video camera, with eyes, eyebrows and nose marked by the program, and s/ he would be able to move the cursor on a grid and practice the selection mecha- 142 nism). The training session lasted 5—10 minutes. No data was collected during the training session. 0 The testing sessions would test tasks T1, T2, T3, T4, and T5, in that order. Before the data collection would begin, the experimenter would briefly describe the steps in the current. task. Then, the task would be performed and the data would be collected. The projected times needed for each task were: T1: 5 minutes, T2: 5-10 minutes, T3: 5 minutes, T4: 2 minutes, T5: 15 minutes. Upon conducting all the tasks, it turned out that T2 was completed in about 5 minutes, and T3 in about 10 minutes. All other projected times were correct in practice. 0 Finally, the subject would fill-out a short follow-up questionnaire. The second session was repeated about one week after the first session and all the tasks were the same. The only differences were that T4 was not conducted for the second time and there was no questionnaire. 7 .4 Results In this Section, the results of the study are presented. Each task is presented in a separate section for clarity. 143 7.4.1 Task T1: Moving the Cursor Along a Path on the Dis- play In the Task T1, the subjects had to move the cursor along a curve for 60 seconds. The video image with highlighted tracked feature point locations was displayed in the upper left corner of the display. The curve spanned the whole display area. A total of three curves were used: (0 - 00, c1 - 8, and c2 - wave form. The performance for each subject was measured as the average squared error with respect to the target curve. All the coordinates were normalized to the O...1 range. The data for the first and second session for each subject were compared. Squared error for the individual curves ranged from 0.03 to 0.15 in the first session and the sum of squared errors for all three curves ranged from 0.14 to 0.31. In the second session, the error for the individual curves ranged from 0.02 to 0.13, and the sum of squared errors for all three curves ranged from 0.11 to 0.28. Figure 7.1 gives the average squared error in the two sessions as well as the errors achieved by the author. The overall performance increased, both the minimal and maximal value of errors decreased in the second session. In terms of percentage of improvement, the average error for each curve improved, ranging from 8% to 15%, and for the overall error, the improvement was 12%. As for the improvement for the individual subjects, for curve 0 61% improved their performance, for curve 1 55% improved, and for curve 2 72% improved. In terms of the overall improvement (sum of the squared errors for all three curves), 72% improved the performance. 144 Ava square error for task 1. sessnn 1 Avg square error for task 1. session 2 a Minimum Average ~77 015 ~-——--——-—"“‘* Maximum _... r 015 . Maleum -—-—— . 7 l Author's —°— , i 013 ' K < o 12 ' < 0.12 < [ Minimum —+— Average + Average square error Average square error 0 8 curve 1 Curve 2 Curve 3 Curve 1 curve 2 Curve 3 Figure 7.1: Task T1: average squared error To illustrate the shape of the cursor paths, we plotted the target curve and the cursor path that the user produced. Figure 7.2 gives the cursor path for the best and worst overall subject performances. All the three curves for the same subject are given. The subject who achieved the best performance was able to move the cursor along the target path rather well and the errors were minor and the general shape of the movement was in the shape of the target curve. In the case of the worst performing subject, it is very hard to determine the movement pattern, and clearly, the subject struggled to move the cursor even close to the target curve. In the case of the second curve, the cursor was completely off the curve most of the time. Complete comparative results for each subject are presented in Appendix B and the improvements shown in the squared error comparisons are clearly visible with the bare eye. I45 Screen coord. in time. curve 00 lollowmo. sub1_3012. avosqarror 0 025450 Screen coord In time. curve 00 1ollowtng. subljoos. ava.sq.orror 0 10m ‘5“ WM? Task curve - - - 1 User data 7 1 User data W Stan pom! - * - polnl End pornt - . .‘ r 0.3 S 3 u (I é % E 3 C C S E it s > > 0 0 0.2 0.4 0.6 0.! 1 0 0.2 0.4 0.6 0.8 1 X screen coordinate X screen coordinate Screen coord. In time: curve 01 louowing. subL3012. avg.sq.error 0 02627! Screen coord. in urns: curve 01 tollowing. subLZOOG. avgsqenor 0.151451 Task curve Task curve ~ ~ , 1 User data 77 1 User data pornt Ian polnl a paint - ..r 0.5 0.8 ‘ ’ i g 0.6 g o s U '5 0.4 a 0.4 > >- 02 0.2 .'/ l o I I re .17 ' I o f \ _ 7,.7 ‘ 0 02 0.4 0.6 0.8 1 O 0.2 0.4 0.6 0.8 1 x screen coordinate X screen coordinate Task curve , 7 Task curve 7 1 User data 1., ,, 1 User data 7 Stan point End point Screen coord. In time: curve 02 loltowtng. subL3012. avgsqerror 0.056590 0 0.2 0.4 0.6 0.6 1 X screen coordinate subject 12, session 2 best overall performance Screen coord. In time. curve l2 lollovma. subLzoos, avgsq error 0.055681 Y 36789" coordnata O 0.2 0.4 0.6 0.3 1 X screen coordnata subject 05, session 1 worst overall performance Figure 7.2: Task T1: cursor path for the best and worst squared error results in the first session 146 Percentage ot correct and missed selections tor task 2 Average selection times tor correct and missed selections tor task 2 100 r -——— . : SGSSDOFI 1 Sessm 1 ——~— \ Season 2 , ., Sessnon 2 '- Author‘s data _._ ‘2 L Author's data —-— , 80 > \ t 5 Percentage 0t selections 8 8 /// // / // ,// Average selection time (sec) er en l :w ) L A \‘\'L ~f/fh A A \\..// a . 0 1st attempt 2nd attempt 3rd attempt 3rd miss 0 1st attempt 2nd attempt 3rd attempt 3rd miss (a) Selection accuracy statistics (b) Selection times (seconds) Figure 7.3: Task T2: button selection accuracy and selection times 7.4.2 Task T2: Guided Button Selection In the task T2, the subjects had to select buttons on an 8 x 8 grid, as guided by the program. A random target button would be highlighted and the subject would have to select it. After three unsuccessful attempts a new target would be introduced. There were three trials, with ten randomly selected targets. The performance for each subject was measured based on the number of correctly selected buttons and the total time Spent for button selection. Also, the performance using the head-eye input was compared with the mouse-based selection. The results were analyzed for all the three trials of the head-eye or mouse input, or for the first head-eye trial (practice trial), the second and third head-eye trials (test trials), and all three mouse trials. Figure 7.3(a) compares the button selection accuracy in the first and second ses- sion. Subjects were able to make more accurate selections in the first attempt in the second session, on average 63% in the second vs. 41% selections in the first session. In addition, the number of buttons selected in the second and third attempt decreased 147 Time tn mtlisecanth Time in milledcorta Session 2 T2 Set times vs button distance by Fits' law. sessmn 1. Head-Eye—Mouth. Mal 0 Session 1 T2 SeI times vs button distance by Fins" law. session 0, Head-Eye-Voutn, met 0 211300 Heméyeuoum. trial 0 ' ' Head—Eye-Moutn. trial 0 - 15M 10M Time in milliseconds _ __.._., .-... : -. 3+1! "'" -0.5 0 0. 1 1. 2 2.5 3 3.5 109,2 (DIS o in). D ls distance between selection potnts. S is mitten size 0 «O 5 O 0.5 1.5 2 2.5 3 3.5 -1 iog_2 (DIS 4 1/2). D is distance man selection points. S :3 button size Head-eye input, trial 0 T2‘ Sel times vs. button distance by F’rtts' iaw. session 0, Head-Eye—Mouth. trial 1 T2. Sei times vs button distance by Fitts' law, session 1. HeadEye—Motm. trial 1 HeadvEye-Moum. trial 1 - HeadEyeMeuth. trial 1 - 3 3 Time in milliseconds . é , . ' ' . --..'-'- , . J .13.,“ u-_. o -1 415 o 0.5 1 15 2 2.5 3 a 5 -1 0.5 o 05 1 15 2 25 og_2 (DIS . 1m n- - . ' . iog_2 (015 91/2)."" ‘ ' Head-eye input, trials 1 and 2 Q is himnn dun Q in “mm Ii1u Timetnmilaecanth T2' Set times vs button dunnce by Fitts' law, session 0. Mouse. tnai o Mouse. malo - Mouse. inan - 15W 15000 ' B § 10000 E 10000 . C e E - ' ' E . r— 5000 _ sooo _._ _ .:-...- _' I. '. . _ "...'- 0 ‘ . _ .- _-' m -- DEW“ I . 0,5 1 1 5 2 2 5 3 3 5 log_2 (DIS o 1/2). D 15 d’stance between selection Dorris. S is button size T2 Sei. times vs button distance by Fitts' law. session 1. Mouse. ma] 0 -1 -05 0 05 1 15 2 25 3 35 iog_2 (013 .1/2)."' Q to human s11- Mouse input, trials 0—2 Figure 7.4: Task T2: selection times vs. button distance by Fitts’ law 148 from 22% and 14% in the first session to 14% and 7% in the second session. The number of incorrectly selected buttons decreased from 23% in the first session to 16% in the second session. This means that the users were generally more accurate in the second session. The overall number of correctly selected buttons increased from 77% in the first session to 84% in the second session. These results are comparable with those of the Ware and Mikaelian [86] eye-mouse study. For every correct selection in the second and third attempt, the subject made one and two erroneous selections. In the first session, for each correct selection subjects made 1.5 erroneous selections. In the second session, for each correct selection subjects made 0.7 erroneous selections. Figure 7.3(b) compares the selection times in the first and second session. The selection time increases with the number of attempts, which is expected, since the user needs to make more than one selection. The selection time in case of the incorrect selection after three attempts is close to the time for the correct selection in the first attempt. This means that the users were making erroneous selections unwillingly, e.g., either they were reacting slowly to the stimulus of the new target or the open- mouth detection algorithm produced false alarms. The time needed to make the first correct selection decreased in the second session, from 7 to 5 seconds. In addition, the decreased selection times resulted in the overall decrease of 31% in the time needed to complete the experiment: from 238 seconds in the first session to 164 seconds in the second session. 149 As for selection with the mouse, out of 1050 selections, only 9 (0.8%) were wrong, and all were correctly selected in the second attempt. The selection times using the mouse were around 1 second for all subjects. Figure 7.4 plots selection times vs. button distance by F itts’ law. The left column shows the data for the first session and the right column shows the data for the second session. The first row gives the data for the first head-eye input trial, the second row for the second and third head-eye input trial, and the third row gives the data for the mouse trials. The linear regression line is also plotted for each data set. As can be seen, the data for the head-eye input do not fit well into the Fitts’ law framework, when analyzed for all the subject. The data for the individual subjects who performed the task well fit into the Fitts’ law framework better. Figure 7.5 depicts the selection time vs. the distance, by Fitts’ law, and cursor paths from button to button for a subject whose performance on the task increased significantly in the second session. As can be seen, in the first session, the subject’s data is very hard to fit in the F itts’ law framework and the time needed to select each button was very high. The cursor paths from button to button also show that trend: there is a lot of motion just to move the cursor to the desired location and a lot of motion around the target location. In the second session, however, we see that the selection times for each button were below 5 seconds and in some cases close to the mouse selection times. The cursor paths were much more direct and the subject clearly didn’t make a lot of erroneous motion. Complete results of the Task T2 are presented in Appendix C, where the data for each subject are compared along the two sessions. 150 m H.228 226.80 coeconx mud med md mhmd mNd mmrd o . Rant. r .4 ...t? f », mnmd m6 mwod mud mnmd L w m _mE .moonlfiaw .cozsn 2 cozan E0: 9:8 .258 5302.05-08: can cause 2 w .958 c2623 c0023 89.5.0.6 m. 0 AN: + 99 mice. no n an N 3 F no 0 no- r 4 t t . 4 0 mt xvamfiau .fl .. .5... x a lullllliwtilfinw . . tctlfltliurtnills. V . .. .- . m Ilillln c ,..a..-. u u a v . ..... a a . a m. a . a a . a . m . 2 . S we . 9 . 3 m— . 2 . : . «a £2.88: . t t w; 22... .58 $368: . 2 . -.l Emersozs. was: . WM «Saline .25 96 .2 so. .mzi >2 22:5 c0223 8:22“. .u> 3E: ._ow eieugpiooo UOOJOS A spuooes u! emu H Eommmww SaEPooocoocuax _. whod mud mwwd md mnnd mNd mufio o . . . . o Eu 8:3 e w .258 5.8.8 8928 85% 2 o .a: + 99 «too. QM n md N m.— — md o md. v. ,- ...:. 51:13...» ...:. a. _x .a n a. . u _ _ .- w u a . n u a a n r U r n I c I H . a . 1 ...n ...:.i .1, . a .. ......... Nd 22.13302 '— - OIDhtoln‘fQNv-O . t... $335.58 -2558: lol. 03:35:02 . mono...— . i a a 88-33 .25 ea .2 in. BE .5 28.3 53.8 8.5% .2 85.. ..om eieugpiooo ueeies A spuooes ug emu Figure 7.5: Task T2: selection times vs. distance by Fitts’ law, and cursor paths for trial 2 for a subject whose performance increased in the second session 151 When we add the result that the number of errors decreased in the second session, we can define overall subject performance in a session using the following formula: T2_sessi0n_performance = (l + num.ber_0f_misses) X total_elapsed_time The lower the score, the better performance, since we want to minimize both the number of misses and the elapsed time. When compared using the above formula, we get more balanced results, and can compare the scores of different subjects. In terms of the improvement for the individual subjects in the second session, 83% of the subjects decreased the elapsed time for the task, 67% of the subjects decreased the number of misses. Overall task performance improvement was observed in 83% of subjects. Sources of Erroneous Selections Why did the subjects make erroneous selections? Did they improve in time? It is important to discuss these issues so that we can identify possible problems and correct them. Figure 7.6 shows which buttons the subjects selected for each target button in the first and second session, respectively. For ease of reading, only the non-zero entries are shown. The number in each entry represents the number of times that button was selected while selecting the target button displayed in bold. We can observe three things: 1. We can see that often subjects selected buttons adjacent to the target button. That is understandable, the reason for this error is that they either didn’t position the cursor prOperly or moved slightly while opening the mouth. 152 .Amcote $2 Samoan“ 893m 3min can .385 89: ucwmeawu 3cm “0 833m x53 £8380— :ofizn sowHSAB: 2: am 22828 magneto 2: m0 @358an 93 “.5833 83% 52w use .2833 333 $8395 05 35853 swap 2: 503:2 gowns“ 2: 85852 $05 was $55 EDS some 5 define nomads. some new :ofiEEEv cote 2: 29852 9.838 93. .aowmmew 9508 can “mew m5 E 35 2:8 .2: .8“ Sec 2: 883%: 958 3833 025 SE. m. I. N F n d. r m. 1. u. in. I t t. - a] N .. a .II .m. ._u . I I II I E III I N o m .q. I.. I r“III E I h c . m. .mmem E: 3:253. .02 2:; Saba. __ll Figure 7.6: Task T2: Distribution of the erroneous selections for each target button 153 2. The number of erroneous selections at the coordinates of the previous target button is high in most cases. The reason for this error is that subjects kept their mouth Open (or kept opening and closing it) when the new target was initiated and they simply didn’t move to the new target and stayed still fixating the previous target. In some cases the subjects would move very slowly from one target to another while keeping their mouth open most of the time (in spite of the experimenter suggestion to close the mouth and open it only when they want to make a selection). In other cases the algorithm for detecting the open mouth state was simply finding that the mouth was open when it was closed, which would cause an erroneous selection. To overcome this problem, we could repeat the same experiment using a different selection mechanism, e.g., hardware button that would be less prone to errors. 3. In the second session, the number of erroneous selections at the coordinates of the previous target button decreased. It slightly increased in the last trial of the second session towards the end of the experiment, which can be connected with the fatigue of the subjects or their decreased attention. The general trend in the second session was that the erroneous selections were not widespread across the grid, which means that the subjects got more skilled using the interface. Were There Some “Hard Positions”? We noticed that the number of errors decreased in the second session. We also have to ask whether the errors were uniformly distributed across the display positions? 154 trial 0 trial 1 trial 2 Button 23456789101:l2345678910|2 2345678910|z|2 Sess.1 yyynynynn 5 yyyyyynnn 6 nyynyynyy 6 17 Sess.2 nnnnnnynn llynnnynnnn 2 nnnynyyyn 4’7 Table 7.2: Number of erroneous selections at the previous target location: “y” entries mean that there were more than three selections at the previous target location, “11” entries mean that there were less than or equal to three selections at the previous target. Session 1 Session 2 percentage of correct selections: bright number of erroneous selections per each shades represent higher selection accuracy correct selection: dark shades represent and darker shades represent lower selec- more erroneous selections per one correct tion accuracy selection for a target button, and bright shades represent less erroneous selections per one correct selection (Entries with a dash were not the target.) Table 7.3: Percentage of correct selections and the number of erroneous selections per each correct target selection: data from the 8 x 8 grid is collapsed into a 4 x 4 grid. 155 Table 7.3 shows the percentage of subjects who selected each target button correctly and the number of erroneous selections made per one target button selection. The data from the 8 x 8 grid is collapsed into a 4 x 4 grid for clarity. In the first session, the percentage of the correct selections for each target button ranged from 56% to 89% with 73% average, while in the second session it ranged from 67% to 93% with 82% average. We also calculated the number of erroneous selections subjects had to make for each correct target selection. For different display locations, in the first session the number ranged from 0.6 to 1.4 with 1.1 average, and in the second session it ranged from 0.5 to 1.3 with 0.8 average. The errors were distributed rather uniformly across all the grid locations. There seems to be a slight increase in the number of erroneous selections in the corner points in the second session. However, not all corner points had this high error rate, and this does not hold for the first session. 7.4.3 Task T3: Guided Dragging of Icons In the task T3, the subjects had to drag 100 x 100 pixel icons, as guided by the program. The initial and target positions were randomly selected and there were 10 draggings in one trial. The initial position was labeled with a basketball image and the target position was labeled with a basket image. The subject had to correctly place each basketball icon on the basket icon, and only then the new icon positions would be displayed. The selection was done by opening the mouth, and the mouth was kept open while the basketball icon was dragged to the basket destination. When the mouth would be closed, the icon was released. There were three trials as in the 156 Average number ot correct selections by attempt tor task 3 Average number at dragging steps tor task 3 60 # Slessronvt 3° F v . I V . Sess:on71 l Sessng ,1, I Session2 -+* ‘ Authors data __._ Author‘s data —-— so » \ . 25 - « $5 a 3 4o» \ x, . 33. 20» l 5?? \ 5 o - E 30 \ g s . 3 c 8 § 20 ‘ / 4 g e: / t “ / 10 . / ‘ /, o - l 1 9 10 Attempt number Step number (a) Number of correct selections by attempt (b) Number of dragging steps Average selection times for correct selections by attempt for task 3 14 . 3232232; . Authors data —+— A 12 . § § .. § tn 6 t § § 4 ' 4 ( 2 . ,l - - - Attempt number (c) Selection times (sec) by attempt Figure 7.7: Task T3: number of correct selections by attempt and their timing, and number of dragging steps 157 task T2. The performance for each subject was measured based on the number of steps of dragging needed to drag an icon to the target location and the total time spent for dragging. As in the Task T2, for reference the performance using the head- eye input was compared with the mouse-based dragging. The results are analyzed for all the three trials of the head-eye input or mouse, or for the first head-eye trial, the second and third head-eye trial and all three mouse trials. In this case, the subjects had to complete each dragging task; the new target would not be initiated unless the previous target was placed correctly. Thus, we measured the number of steps needed to perform a dragging, where a step is the number of select-move-release tasks the subject performed for each target. Ideally, this should be one, that is, the subject should select an icon, move it to the target, and release it. However, in practice, the subjects would place the icon incorrectly, then they would have to select it again and move it to the target. This last step has been repeated many times for some subjects. In our reporting of results, we also give the number of erroneous clicks needed to make a correct selection. In some cases, subjects needed more than 10 steps and/or made erroneous selections more than 10 times. Thus, we analyzed the data when the number of steps/ errors was more than 10 as if it were 10 items. Figure 7.7 (a) gives the statistics of the number of correct selections in the 2"" attempt (z' = 1...10). In both the first and second session most of the correct selections were made in the first or second attempt. In the second session the number of times the correct selection was made at or after the 10‘" attempt decreased 58% (from 24 to 10). The author made mostly correct selections in the first attempt. Figure 7.7(b) 158 gives the selection times for each correct selection in the '2?” attempt. The selection times for both the first and second correct selection were about 4 seconds. In the first session, the time needed in higher attempt number, generally, increased, while in the second session, the time needed in higher attempt number, generally, decreased. In both sessions, selection time was around 12 seconds at or after the 10th attempt. The author achieved times similar to other subjects for in the first two attempts. However, for higher attempts, the author needed more than 3 attempts only twice (3% of selections). Figure 7.7(c) gives the average number of times the dragging had to be performed in i steps (2' = 1...10). In the second session, most subjects were able to complete most draggings in one or two steps, unlike the first session, where the number of steps varied greatly. The author, for example, never needed more than four steps. As for the dragging with the mouse, out of 2100 selections, only 16 (0.8%) were wrong, but all of those were correctly selected in the second attempt. The selection times using the mouse were around 1 second for all subjects. Figure 7.8 plots selection times vs. distance by Fitts’ law. The left column shows the data for the first session, and the right column shows the data for the second session. The first row gives the data for the first head-eye input trial, the second row for the second and third head-eye input trial, and the third row gives the data for the mouse trials. The linear regression line is also plotted for each data set. As can be seen, the data for the head-eye input do not fit well into the Fitts’ law framework when analyzed for all the subjects. The data for the individual subjects who performed the task fit better into the Fitts’ law framework better. 159 Tlme in mlfllseconds Tlme In mlltseconde Time In mlliseconde Session 1 Session 2 T3 Sal bmes vs button dlstance by Fms‘ law. session 0. HeaosEye-Moutn. trial 0 T3 Set times vs button dustance by Fltts‘ law. sesswn 1. HeadEye-Mouth. trial 0 20000 *7 , 20000r v' _ _fi v . . Head~EyeMouttLtnaIO - Head~Eye-Mouth.tnalo -- ' . ' l ' . 15000 15000 - - ' ' S - . S 1» 10000- E 10000- E d) E ,_ 5000 5000 0 ' o ' -1 05 0 05 1 15 2 25 3 35 -1 ~05 0 05 1 1.5 2 25 3 35 loq_2 {DIS + 1/2). D |S dtstance between selection pomts, S [5 button stze log] (DIS 0 1/2). D ls d'stance between selection points. 5 is button slze Head—eye input, trial 0 T3 Set. times vs. button dstance by Fltts’ law. sessvon O. Head-Eye-Mouth. tnal 1 T3 Sal. umes vs. button distance by Fltts‘ law. sesslon t. Head-EyeMouth. trlal 1 20000 v 20000 . . Head-Eye—Moum. trial1 - ‘ Head-Eye—Mouth,-trlalt ‘- 3 § Time in milliseconds 0 ML: , -l 0.5 0 0 5 1 1 5 2 2 5 3 3.5 log} (DIS v "2). D 13 dstance between selectlon poms. 5 Is button stze m2 (DIS 9 02). "‘ r ‘ S I‘ "mm <5" Head-eye input, trials 1 and 2 T3; Sol. time: vs. button dlstance by Fltts' law. session 0. Mouse. trial 0 T3. Sel. umes vs. button d'slance by Flm' law. sesston 1. Mouse. tnal 0 20000 Mouse. that 0 - Mouse. ma] 0 - 15000 15000 - 8 § 10000 E 10000 E ”F 5000 5000 . .. . . . I It . . _ e I *4- t o ' —. ' 0 A 1 15 2 25 3 35 -1 —05 0 0.5 1 1.5 2 25 3 35 Iog_2 (DIS o 112). 0 Is distance between selection mlnls. 8 IS button sue logAZ (DIS . 1/2). DIS distance between selection points. 5 :5 button size Mouse input, trials 0~2 Figure 7 .8: Task T3: selection times vs. distance by Fitts’ law 160 Figure 7.9 depicts the selection time vs. the distance, by Fitts’ law and cursor paths from button to button for a subject whose performance on the task increased significantly in the second session. As can be seen, in the first session the subject’s data is very hard to fit in the F itts’ law framework and the time needed to select each button was very high. Such a huge number of selections is the result of frequent losses of the mouth tracking (or the user didn’t keep the mouth open consistently), and that resulted in many short paths of the target icon. In the second session, the number of such short paths decreased significantly. The cursor paths from target to target also show that trend: there is a lot of motion just to move the cursor to the desired location and a lot of motion around the target location. In the second session, however, we see that the selection times for each target were much more consistent with the Fitts’ law curve, and range 5—8 seconds. The cursor paths are much more clear and the subject clearly didn’t make a lot of erroneous motion. Complete results of the Task T3 are presented in Appendix D, where the data for each subject are compared across the two sessions. In order to calculate the overall performance, we have to incorporate the number of steps needed for each dragging and the total time. The number of steps can be calculated using the formula: totaLsteps = Zi:1...10(’i - number_0f_steps,-). The last row of Table D1 shows the total number of steps averaged over subjects. The minimal value was lower in the second session (31 vs. 35), and, the average value was also smaller in the second session (67 vs. 84). The value of 67 steps is the number of selections of target icon, and since there was total of 30 icon positions, each dragging 161 N nommmom 23.208 :38» x o 15.3. > F mhmd mud mwwd md mhmd mwd mN—d l..-'-' f m 6.5 «8323 62:5 2 8:3 £9. ”£8 .258 0% 8:3 e w 0&8 8.8.3 $028 8:56 2 o .8: + may ~18. 58205.80: mm m mN N m; — md 0 We. F . .. ..... .. t. \w.\ti ..a new... . ..l..b\\ - 9 \Alc - n n n c «62.05.3302 .... . «#255530 .0363: 1.. no .2: “£30.26. wéwol if lr N—onlfiam .m:8_ _oxfi oo :69 .2 is .mzi 3 8259 c853 8555 .u> v.08: ..ow OwhwlflvflNV-O eieugpiooo UOOJOS A spuooes u! emu H :ommmem 236.08 coEomx mhmd mhd mwwd p re r 2. kl, m.o mhnd mNd mwpd o [bib-.3. ... F N as $8-38 .825 2 8:3 so: 2.8 5.58 582.0063: Se 8.5.. e w .958 8:028 53.3 8:59.. e o .a> + @0318. r to l‘. o en a ma N 3 F be o 99 F. . . . : r - . o N .1. . .av..- . q.,.m.._wu-..t. . u... o. . e .. w u R: .. dun .u..r n. o I... m. n. n no. u an T. . .. .. .........._ .. v . Jun . a .Er u u . . . a m n u o a u e u n n u u I i w v u n a no a nu. a t \- . . . .. . a . . .. . . . m . 9 C . ~— . 9 . 1 . . ms . . m: , . «.0 messages. . t .... «.22....50 imbue: 9 ll. 23.53 .weaoa l mm eieutpiooo UOOJOS A epuooee u! etuLL $8133 6:02 Elm 8560? do. 30. .95. >9 acosan c0053 09383 .n> no.5. ..om Figure 7.9: Task T3: selection times vs. distance by Fitts’ law and cursor paths for trial 2 for a subject whose performance increased in the second session 162 was performed in 2.2 steps. In the first session, a total of 84 steps resulted in 2.8 dragging steps. Then the overall performance can be calculated using the formula: T3_sessz'on_performan(:e = totaLsteps x total_elapsed_ttme The lower the score, the better performance, since we want to minimize both the number of steps and the elapsed time. When using the above formula, we get more balanced results and can compare the scores of different subjects. In terms of the improvement for the individual subjects in the second session, 55% of the subjects decreased the elapsed time for the task, 78% of the subjects decreased the number of steps needed for one dragging. The overall task performance improvement was noted in 67% of subjects. Spatial Distribution of Errors Figure 7.10 gives the spatial distribution of the erroneous selections for each target locations pair for all subjects in the first and second session. The filled gray rectangle is the dragging destination, and an empty rectangle is the dragging source position. The two adjacent rows plot the data for the same trial in two sessions. The general trend is that the number of errors decreases as the session progresses, and that the number of errors in the second session is smaller than in the first session. In addition, in the second session, the errors are, generaly, more clustered around the line that connects the source and destination locations. Figure 7.11 gives the spatial distribution of the correct selections for each target location pair for all subjects in the first and second session. Plotting format is the same as in Figure 7.10. Ideally, the points should be clustered around the starting and 163 ending dragging positions. However, since the dragging was done in more than one step many times, the selected points were spread across the display. The selections are distributed on or close to the line connecting the two locations, which means that the subjects were attempting to drag the icons along the shortest path to the destination. Figure 7.12 shows the spatial distribution of the erroneous selections for each target locations pair. The darker the line that connects the target location the more errors were made in that dragging. Clearly, in the second session, the number of errors decreased. In the first session, there was a lot of variation in the number of errors for different target pairs. However, in the second session, the amount of the variation decreased. This leads us to conclude that no particular location on the display caused problems for the users. We were especially interested in the corner points, since the experimenter’s and subjects’ impression during the experiment was that many subjects had a hard time selecting corner positions. 7.4.4 Task T5: Application Control—Net-surfing and E-mail Writing In the Task T 5, the subjects had to integrate all the skills learned in the tasks T1-T3 to perform a real-world task: surfing the internet and writing an e-mail. The details of the task were described in Section 7.3. The performance of each subject was evaluated by watching the video tapes that recorded the session, and the experimenter recorded the number of erroneous clicks, the number of times the experimenter had to turn 164 583% one .3 35E 0.53 953028 2: 823 95380— 2: udowoaoh 33. x35 stem $23.55 2: E :oSdEumwv can 858 2: $8980 0:: 2a. define“: 8.58 magma 05 2. 98:3 .9an one use :oEmnSmev mEmwfiv 05 mm Swag 82¢ 05 £55 £83 no.8 E 5.3 "83.82 «owed» some 5“ comesflsmmv casewo— mcosoflem 38:80 2: E0858 3538 BE. dommmom 9808 can “mi 05 E if: 08$ 2: 8m Saw 2: £8858 mace E8023 93 ER. N m la .mmemxflb N : H la .mmomnfiiu N o J. _ a o m e m amass: ®UC®=UOm mhmda SOSSUA: EOmudimumov U—nd OUHDOw a0 HfiH. Figure 7.10: Task T3: Distribution of the erroneous selections for each target locations pair 165 800.350 030 .3 0008 803 9850200 05 8055 805002 05 800058 Sow 0.00—m .509 805.8% 08 5 :0505800 0:0 08:8 05 800580 0:: 03H. 485002 08:00 mewmuv 05 mm 80:60 .0380 05 0:0 :0505800 wcmwwwuv 05 mm 8028 02E 2: $55 03.8 5000 5 8.3 :05002 ”.088 £000 88 demeaning H8382 $55028 8880 0% 58038 08:28 058 482008 0:88 0:0 88¢ 05 5 _05 088 05 SH 800 05 E8058 038 8000.80 93 0:8 308405 N _ S _ 0 _ w _ s _ 0 A m. _ e W 8:258 88a :03002 0830:3000 was 08:00 00 .88 m _ m 4 s 08.3.: Figure 7.11: Task T3: Distribution of the correct selections for each target locations pair 166 Session 1 Session 2 Figure 7.12: Spatial distribution of the erroneous selections for each target locations pair: darker lines represent more errors for that target locations pair. off the camera (in order to assume the control of the pointer) and intervene with the mouse, the number of times the experimenter had to type in text for the user, the total time the subject spent in completing the task, and which parts of the task were skipped. The nature of the task was such that not all the erroneous clicks resulted in some unwanted action. The subject might click on the Netscape window, but that area would not be associated with a button, and there would be no change in the displayed WWW page. Thus, as erroneous clicks we counted only those clicks that resulted in some unwanted action such as loading a wrong WWW page or opening an unwanted window (e.g., security information or similar). In such actions, the subject would need to recover from the error by clicking on the Netscape’s “Back” button or to click on the “Cancel” button on the popped-up window. All those actions would add extra burden to the task and would require the subject to spend more time in the completion of the task. 167 sub-task weight sub-task weight 1:url 0.20 6:zip 0.14 2zheadlines 0.04 7zmail 0.02 3:scroll 0.10 8znewmsg 0.02 4ztitle 0.02 92type 0.40 52weather 0.04 10:send 0.02 Table 7.4: Task T5: Assigned weights for each sub-task Session Author Average over all subjects: 1 l 2 Head-eyes I Mouse Number of erroneous selections 8.6 5.2 0 0 Number of camera turn-offs 0.4 0.2 0 0 Experimenter types 0.3 0.2 0 0 Elapsed time (mm:ss) 9:12 8:35 2:40 1.15 Percent of the task completed 85% 96% 100% 100% Table 7.5: Task T5: average number of errors, elapsed time and percentage of the task completed For each subject the percentage of completed tasks was calculated based on the assigned weight of each sub-task. The sub-task weights are shown in Table 7.4, and were assigned based on the number of selections and text entries involved in each sub-task. Some subjects’ initial task did not include all the sub-tasks due to time constraints or subject’s age. For one child we changed the tasks a bit and adapted to his age. The change preserved most of the sub-task actions but used different WWW pages. Table 7.5 shows the average error and timing statistics over all subjects in the two sessions. In the first session the number of errors ranged from none to 19, and several subjects were not able to complete the first session. The experimenter had to turn off the camera a total of 8 times and to enter text for the subjects 5 times. From 168 the first to the second session the number of erroneous clicks decreased 40% (from total of 154 to 93), as well as the number of times the experimenter had to intervene (4 interventions with the mouse and 3 with typing). Also, the total time needed to complete the task decreased 7% (from 552 to 515 seconds). Only 5% of subjects (one out of 18) were not able to complete the task, compared to 44% (8 subjects) in the first session. For comparison the author completed the task in 160 seconds using the head-eye input interface, and 75 seconds using the conventinal mouse. Based on the results of the previous tasks using the mouse we can assume that the author’s performance with the mouse is comparable with other subjects’ performance. Then we can conclude that the average performance of our subjects in the second session was 6.8 times slower than had the task been peformed with the conventional mouse. In order to calculate the overall performance of the subject, we calculated the total error using the following formula: totaLerrors = #_err0neous_clicks + 2 - (#_turn0ffs + #_ea:pe7‘z'menter_typz'ng) The elapsed time is also adjusted for the percentage of the task completed, so the elapsedJime percentageiompleted ' overall time was calculated as Finally, the overall performance was calculated as: T5_session_perf0rmance = totaLerrors - oueralliime The lower the score the better and if the score in the second session was lower than in the first session, we assumed that the subject’s performance improved. In terms of the improvement for the individual subjects in the second session, the elapsed time for the task decreased for 61% of the subjects, while the number 169 of erroneous selections decreased for 67% of the sub'ects. Overall task erformance J p improvement was noted in 61% of subjects. Which Sub-Tasks Were Hard? Were there any differences among the sub-tasks in hardness? Based on the skipped tasks statistics, zip code entry (sub—task 6) was skipped the most times (10), then clicking on the weather menu (3), and all the other tasks were skipped twice or once. It is not fair to say that sub-task 6 was the hardest, since the experimenter decided to skip it often and to go to the mail writing part due to the time constraints and when the session progressed with a lot of errors. Table 7.6 shows the distribution of errors across the sub-tasks for both sessions. It is clear that the tasks of clicking on a menu button (e.g., headlines or weather) was hard in both sessions. The first click in the task on the menu button (headlines) had more errors than the second click in sequence (weather). The erroneous clicks were always on the adjacent menu buttons. Another problematic sub-task was entering the URL. The errors in this case were mainly made in selecting some buttons that were located just above the URL entry field. Entering the ZIP code was also hard, and the errors were mainly in selecting buttons adjacent to the ZIP entry field. Finally, two other subtasks that had high error rates were clicking on the “New Msg” and “Send” button. These buttons are of the same size as the “Back” button, but were located in the upper left corner of the display. The errors were also clicks on the adjacent buttons, some as small as 0.270 and 0.530 of viewing angle. These errors resulted in 170 Number of errors Number of errors sub-task Session 1 Session 2 sub-task Session 1 Session 2 lzurl ‘24 12 6:zip 23 12 2zheadlines 42 22 7zmail 0 3 3:scroll 12 4 8znewmsg 12 12 4:title 8 1 9ztype 2 1 52weather 26 13 10:send 5 11 Table 7.6: Distribution of errors across the sub-tasks the closure of the panel with main buttons or with the closure of the “To:” field in the mail window. The scrolling task did not result in a high number of errors, but was hard for a number of subjects. That task took a lot of time and it was problematic due to the small target area size of only 0.36” of viewing angle in the horizontal direction. In scrolling, there were a lot of erroneous clicks just around the scroll bar, however, they did not result in some action that required the subject to recover from the error. The typing task also caused some problems to the subjects who were not typing blindly. They had to look at the keyboard when typing and the pointer would move out of the mail window and then the keyboard input would not be accepted by the mail window. This problem could be solved by changing the user interface to recog- nize the beginning of typing and to freeze the pointer position. In this experiment, however, that was not done. An alternative solution to this problem would be to introduce a “Stop—the-head-eye-input” command that would be issued through some facial expression (e.g., smiling) or by depressing a special key. 171 7.4.5 Summary of Questionnaires After the first session all the subjects filled out a questionnaire from which we wanted to capture their impressions of the interface. The following questions were asked: 1. Do you find the idea of head-eye input device interesting? YES/N0, explain: All subjects answered YES. The predominant explanation was that it would be “useful for handicapped users” (55% subjects) and 39% subjects said that it would be “nice to integrate the interface as the general input”, either replacing or working with the conventional mouse. 2. Did you like the head-eye interface? YES/NO, explain: In this case, 83% subjects said that they liked the interface, and 17% said that they did not like it. The subjects’ explanation was that they didn’t like it “at its present stage”, since they had difficulties in the interaction. 3. Did you have any problems using the interface (e.g., not able to control the pointer or click, neck pain or discomfort, or other)? YES/NO, explain: Not all subjects specified problems: 89% said that they had problems, and 11% said that they had no problem. Out of the ones that specified them, 44% had problems with cursor control, 37% had problems with clicking, and 25% experienced some neck pain or discomfor. Some subjects reported more than one problem. One subject commented that “it felt a bit strange to them to open the mouth at the beginning”, but in time the subject got used to that fact and learned how to Open the mouth for the system to recognize that. One subject 172 commen t0 movlI mow lll“ 4. Do you ’7 Subject 8 int egr atii have altt their 1110 computt‘ be imprt to type i The results itix'e. The prob menter. and son the second sessil change in the m Since nothing ha: perception of I! N? W ' . hilt performing t The at .JOW 8022 r“ J . . Dubltm. their own commented that he had problems with the cursor control since he “was not used to moving his head while working with the computer and that he would rather move his eyes only”. 4. Do you have any other comments or suggestions for head-eye interface? Subjects suggested an alternative way of selection, e.g., through blinking, or integrating with voice recognition. Two subjects said that it would be nice to have alternative selection mechanisms, since the current one made them “keep their mouth closed” and that they wanted to be able to talk while using the computer. Most subjects suggested that the general control of the cursor should be improved as well as the interface itself, e.g., to recognize the user’s intention to type and not to move the cursor while the user is typing. The results of the above exit survey showed that the general impression was pos- itive. The problems such as adjustment to the interface were expected by the experi- menter, and some of them were not present in the second session. For example, after the second session, a number of subjects asked the experimenter “Did something change in the meantime?”, since they found the interface easier the second time. Since nothing had been changed in the interface we can conclude that the subject’s perception of the interface changed: they got used to it, and didn’t feel that strange while performing the tasks. The above suggestions indicate that the subjects experienced the “Midas Touch” problem: their own body was the interface and in some cases they were simply not 173 amme hnwml 7.5 Die The overall News improved 0f the nu 0r 4 titsl new (‘_>\‘ tlie 811 able to adjust their behavior and movements to incorporate the interface needs. Over the two sessions, most were able to adjust, however some could not adjust. 7.5 Discussion of T1, T2, T3, T5 Results The overall performance in the individual tasks showed a trend of subject improve— ment in time. In task 1, 72% improved; in task 2, 83% improved; in task 3, 67% improved and in task 5, 61% improved. Overall improvement was measured in terms of the number of tasks in which the subject improved: when a subject improved in 3 or 4 tasks, by definition the overall performance improved. A total of 61% of subjects improved (28% in all the tasks, and 33% in 3 out of 4 tasks). The remaining 33% of the subjects improved in two tasks only, and 6% (1 subject) improved in only 1 task. The total time the subject spent in performing the tasks with the head-eye input interface in the first session was on average 27.8 minutes and in the second session it was 24.6 minutes. The time needed to make the first correct selection was 4.6 and 4.8 seconds, for the tasks T2 and T3. The selection times were similar in Task T5. Subjects needed about 5, 10, and 6 times more time to perform tasks T2, T3, and T5, respectively, using the head-eye input interface compared to the mouse. It should be noted that a few subjects were able to perform Task T5 only about 2—3 times slower than with the conventional mouse. In the second session, for subjects 01, 02, 07, 10, 12, and 13, three blue dots were tracked instead of the facial features. The tracking in the first session was not 174 sunshlm custhpu hnnmuhu ndflwyuhm nueun\i tadtvasln, erperinien' fJeus trust The apphcat window live : n to 1 Conch 0 It Flat successful (for children) or the subjects could not see well without glasses. In all six cases, the performance improved in the second session. It should be noted that our child subjects had some difficulty in using the head-eye input interface. The problems were due to their impatient and rapid head movements, and they often moved a lot in the chair during the session. Thus, they would often move out of the focus of the camera and would have to re-position. The net-surfing task was boring for them and instead of asking them to perform the regular tasks the experimenter instructed them to visit appropriate WWW sites for their age and to focus more on the writing of e-mail messages to their parents. The most important conclusion is that it was possible to control a real-world application without any changes and adjustments to the interface. The Netscape window we used, and the news WWW pages, had some rather small selection areas. If we are to make a custom-built application suited to the interface, we can adjust it to the special needs of the head-eye input interface. We believe that the same conclusions could be made for an eye-based interface, since our head and eyes are not always perfectly still, and due to our natural movements the interface is prone to glitches. To the future designers of applications based on the head-eye input interface, we would suggest the following: o It is possible to select out of 10 x 12 icons on a 19” display. Selection time for each icon is about 3—5 seconds or less for an experienced user. This timing is higher than 1 second per selection, reported in many eye-mouse or head-mouse 175 hated st USGIS pt. ‘ llll’ My and am his at tions. ; "Batik %\\\\'\s We r. "Bl 'lt ' 1 CL w] 6n based studies, but we believe that the selection time could be reduced as the users practice more. 0 The reasonable button size is about 1.40 of viewing angle (e.g. Netscape “Back” and similar buttons). Our subjects had very little problem to select these but- tons and they had to use them often. For example, for many erroneous selec- tions, an unwanted WW’W page would load and subjects had to click on the “Back” button, or a pop-up window with an error message would open and subjects had to click on an “OK” or similar button. In both cases, the buttons were of similar size. The number of erroneous selections while selecting the “Back” button was very low, compared to the selections of other buttons. 0 It is possible to select smaller buttons (05“), however, the error rates increase as the size decreases (e.g., adjacent menu buttons on a WWW page). When small targets are isolated, it is possible to select them with ease and small number of errors (e.g., the title of a story on a WWW page). 0 There should be a blank space between the adjacent target buttons that are crucial for the control of the application. That would minimize the erroneous selections. o The users must have a way to signal to the interface to temporarily stop the cursor control and have an easy way to re-start it. This option would be used when the user wishes to type some text and the cursor should stay on the text entry area regardless of where the user is actually looking. 176 0 Stltt‘tt. in ex; SUlljl‘t‘t lit-Thin: . ll hit ii, Patter ('llair‘ View the; 7.5.x As “it The ((3 the us ‘0 the “hfiad. Wen, tl b9 abk In C the 11101 ’6 lute expldlfl ( 0 Selection with a face expression is possible but can cause some problems if the face expression determination algorithm is not working perfectly. When the subject’s condition would allow, an alternative selection mechanism, such as keyboard or other hardware button, or voice signal, should be used. 0 If the interface is to be used by children, it should incorporate their movement pattern while working with the computer: children may not sit straight in the chair, and might often lean towards the display, or to the chair sides. Thus, the camera should not be focused on their face only, but on the wide area around them. The interface should not be sensitive to their sudden moves. 7.5. 1 Learning curve As with all new interfaces, the use of the head-eye input interface must be learned. The idea of using one’s head as an interface is not very familiar to the average user, and the user must constantly think about where and how to move the head. This is similar to the hand-eye coordination in the case of the conventional mouse. We can call this “head-mind coordination”. If the tracking and update of the screen coordinates works well, the users should be able to learn this coordination fast. hr‘loreover, they should be able to adjust to the interface so that they do not think about the coordination. In our interaction with novice users we often compared the new interface with the mouse. This metaphor helped the user to familiarize with the general idea of the interface. A positive aspect of this comparison was that we did not have to explain extensively what is the potential use of the interface, and how to use it, since 177 ‘3 .'.' “-3- l-.-" if"! most ptoph the perform. llutu‘er. th thunder must lhei muvthwn vthtuiv aherthe the need the SQQQfi inatrn; Inouse of the net n lab v Ii 115-{TS to most people are nowadays familiar with the mouse. Also, we could easily compare the performance of the new interface and with the mouse on some common tasks. However, the first drawback of the comparison was that the users implicitly expected the same level of performance with the new interface that they were achieving with the mouse. The important fact most users were forgetting was that they had to learn how to use the mouse, and that there was some adjustment period before they were skilled with the mouse. Thus, some users had a number of complaints about the interface after the first usage, and most complaints addressed the issue that they do not see the need for the new interface when they can do much better with the mouse. After the second usage of the new interface, when the users got more skilled, there were not that many complaints and comparisons with the mouse. The second drawback of the mouse metaphor was that the users would not think about other potential applications of the interface, which extended well beyond the mouse capabilities. However, we did not notice that the mouse metaphor blocked other ideas, and many novice users and lab visitors would quickly point out other potential applications. How well do the users learn the interface in practice? We observed a number of users While using the interface, and there is a common pattern for everyone. 1. Novice level: In the first few minutes of use all users perform rather poorly. They make rapid head movements and observe how the cursor would move. As the time passes, they get more skilled and learn how much they need to move to produce the desired cursor movement. 178 l. .llter *( and Ill hire a <\\\\ 2. After the initial “warm-up” period, the users either get used to the interface and their performance improves, or they still struggle with the interface and have a hard time performing the tasks: (a) Expert level: If the users switch to this level, they can use the interface without any problems and they perform rather well, and we can compare their performance with the use of the conventional mouse. (b) Intermediate level: The users who stay at this level don’t get used to the interface due to several reasons that range from the problems with the underlying computer vision algorithms to the problems related to the adjustment to the interface: 0 For some, the computer-vision-based tracking did not work well, and the users were not able to get good response from the interface. For example, users with glasses could not use them at present since the tracking would fail. Thus, when they would need to read some de- tailed information on the display they would lean closer or put on the glasses, which would result in erroneous updates of the cursor posi- tion. Then, they would need to re—position themselves in the chair and to re-calibrate, which would result in several seconds (or tens of seconds) of time lag, leading to user dissatisfaction. For such users, if the tracking problem is solved, they can quickly adjust to the interface and switch to the “expert level”. For example, to overcome the glasses problem, we tracked three blue squares glued just above the eyebrows 179 Intermediate level Figure 7.13: Learning levels for 18 subjects and on the tip of the nose. The movements of the blue spots were equivalent to the eyes-nose movements and the users had the feedback as if the tracking worked well. Skilled level: The tracking works, but they are not able to learn to move slowly and to re-calibrate when needed. Those users would have good cursor control at one point, but would often tilt too much, and an eye would be occluded, or the mouth would not be visible and the interface would not respond. They would not realize what is happen- ing, and when instructed to move slowly and re-calibrate, they would start to rapidly move their head in all directions and would think that they did re—calibrate, but in reality they would just cause more confu- sion to the interface. Either they did not understand the instructions to move slowly or did not want to follow the instructions. In both cases, the results would be that they would not be able to control the cursor movement at all. In time, they would gradually start moving slower and be able to control the cursor. However, they would never get even close to the performance of the expert users. 180 two one-hunt tasks usiugt it sk'\\\\\s\< Ha.— lerel and based on 7.6 In th’ wide CUISC writii U116 it We ('0 repeat Were a! m the l0 €ij The observed learning pattern is similar to the learning pattern of any new com- puter input device. Some users will simply never adjust to the interface and some will pick it up right away. In our experiments, we attempted to test the learning curve for the head—eye input interface. Our subjects attended an experiment that consisted of two one-hour sessions, approximately one week apart, where they performed various tasks using the head-eye input device. Figure 7.13 shows the breakup of the subjects in skill levels. A total of 18 subjects participated in the experiment; 13 reached expert level and 5 reached skilled level. The breakup in the skill levels is done by the author based on the observance of the experiments and the performance results. 7 .6 Summary In this Chapter we presented the results of a study with 18 subjects who performed a wide range of tasks using our head-eye input interface. The tasks involved moving the cursor along a path, selecting buttons, dragging icons, and surfing the internet and writing an e—mail message. All subjects participated in two sessions approximately one week apart. The performance on each task was measured and based on that we concluded that 11 out of 18 subjects (61%) improved their performance in the repeated usage of the interface. The most important fact was that most subjects were able to control the real-world application such as Netscape without any change to the application, and that in the second session only one subject was not able to complete the task. We believe that if our subjects persisted using the interface, 181 they would comparable they would get to be skilled users and their performance with the interface could be comparable with performance using other input devices (e.g., the mouse). 182 Chapi Concl ln this dissel't sented. The i: user’s head an completely not sults of featun that the featu pixel image. tificial Neural movements w. ranging from head mmeme experiments it Chapter 8 Conclusion and Future Work In this dissertation, an interface for handless human-computer interaction was pre- sented. The interface is based on the use of computer vision algorithms to track a user’s head and facial features such as the eyes, nose and eyebrows. The interface is completely non-intrusive, and requires only an camera attached to the computer. Re- sults of feature tracking were presented and accuracy was evaluated, and it was shown that the features could be tracked on average with 2 pixels accuracy in a 320 x 240 pixel image. A framework for gaze direction detection based on the use of an Ar- tificial Neural Network and feature image coordinates movement scaling to display movements was presented. The framework can be used in a number of applications ranging from measuring user’s focus of attention to controlling the computer using head movements. Preliminary results of attention measurement and cursor control experiments were presented. The gaze detection algorithm was used along with the face expression detection (open vs. closed mouth) to develop a handless human-computer interaction interface. 183 The interfa: week apart. button nunr that the user I was selectalij number of a; movement p: selecting butt Our subjects They Spent in showed impro 8-1 Re: 0 A facial geometrj using get componei successful If the per: does “OI w . . A“ algoritl. ‘1 dark- Skin The interface was evaluated by 18 novice users in two sessions approximately one week apart. In terms of the usability of the interface, the appropriate selectable button number, size and configuration were investigated. Our experiments showed that the users can select buttons of 0.50 with a lot of errors, while button size of 1.30 was selectable with reasonable error rates. The selection time was 3—5 seconds. A number of applications of the interface were evaluated: monitoring user’s gaze and movement pattern, moving the cursor along a predetermined path on the display, selecting buttons, dragging icons, and controlling a real-world application (Netscape). Our subjects learned how to successfully use the interface after moderate training. They spent in total about 1 hour in active use of the interface and most of the subjects showed improvment in the performance. 8.1 Research Contributions 0 A facial feature location method was presented based on the knowledge of the geometry of the face. The method searches for the eyes, eyebrows and nose using geometrical constraints, and the knowledge of the face colors. The red component of the color image is used to search for eyes. The method was successfully tested on a number of faces in both still images and video sequences. If the person wears glasses or has beard or mustaches, the method sometimes does not work well. An algorithm for detecting dark skin and a method for adopting the image of a dark-skin person so that the skin color detection algorithm can find the dark 184 skin i mot it 0 Case initial I playi- _ plied l flexibil head In . AIUHlfi Studies Offer 11 Presen‘ . Et’aluq user 19é 18 subj Can get are halt SPIeCled rather (it; Was 3 5 S skin was presented. The algorithm works automatically in conjunction with the motion detection, and automatically finds the adjustment coefficients. 0 Gaze direction detection that requires no calibration was presented. The user’s initial gaze is detected using an Artificial Neural Network, and then the dis- play/control gain factor logic, used for pointing devices like the mouse, is ap- plied for the gaze calculation. This method offers adjustable gain factor, and flexibility for individual users. The user could set-up the interface so that the head movements are rather small. 0 Monitoring user’s gaze in time enables applications such as visual perception studies, attention measurement, or marketing research. This interface could offer non-intrusive monitoring in a more natural setting than what is done at present. 0 Evaluation of the interface in terms of the usability, achievable resolution, and user learning pace and performance was presented. The results of a study with 18 subjects, who performed a wide range of tasks, show how well novice subjects can get used to the interface, which tasks can be performed with ease and which are hard. VVe showed that a target as small as 0.360 of viewing angle can be selected with the interface, and that a target size of 1.40 of viewing angle was rather comfortable for the users to select. The time needed to make a selection was 3—5 seconds. 185 o Succe- Wllllt’r and n to lea: pointt an en averag shower i author complm and in achieve 8.2 F1, The interface this Work, We directlohs in I will One (lav m the list) Of WOUld alltf) 0 Successful control of a real-world application using the head-eye input interface without. any changes of the application itself was achieved. Netscape navigator and mail programs were used as the test programs and our subjects were able to learn how to use the system and to surf the network using their head as a pointer. They were able to browse several VVW'W sites successfully and to write an e-mail message, after moderate training with the interface of, 28 minutes average in the first session, and 16 minutes in the second session. The results showed that they would be able to become very skilled in the interface. The author of the interface, who can be considered an expert user, for example, completed the netscape surfing task in 2:40 minutes using the head-eye input, and in 1:15 minutes using the conventional mouse. Five other subjects (28%) achieved performance approaching the author’s. 8.2 Future Work The interface presented in this dissertation can be used in various applications. In this work, we investigated some basic tasks. In this section we list the possible future directions in the research of the head-eye input interface. We hope that the interface will one day become widely used and will improve human interaction with computers. 0 Improvement of the feature detecting algorithm in terms of the robustness to the use of glasses, and presence of beard or mustaches. Such an improvement would allow wide range of users to use the interface without any constraints. 186 0 [Seth leatur vvaj'tl rnonio Altern. and at: Applitul 0 lrrij)rt)\t 0n Opih user ccn fashion ° EValtlat Olsonn USE‘TS \\ benefit . Improve Su(}1 as “)deth an)" env-i 0 Use of rrrore than one video camera, that would all attempt to find the tracked features. The best results would be used as the output of the system. That way the user would be able to move freely in the workspace and the computer monitor would not need to be the only device that would be controlled. Alternatively, an omnidirectional camera could locate the faces and track them, and another pan-tilt-zoom camera could zoom in to get a better face picture. Applications such as teleconferencing would benefit greatly. 0 Improvement of the UI to have an easily selectable turn-off and turn-back- on option, that would be used when the user is just typing or reading. The user could then use the interface along with other input interfaces in an easier fashion. 0 Evaluation of the interface for physically challenged users, and customization of some procedures, depending on the special user needs. We believe that the users whose only way to control the computer is through head motion would benefit greatly. 0 Improvement of the algorithms in terms of the robustness to the environment, such as presence of skin-color tones in the background or changing lighting conditions. If a fully robust algorithm is devised the interface could be used in any environment, e.g., airports, museums, train stations, etc. 187 APPENDICES 188 In this Perimet of the y 3 differe been de graph Us Table Appendix A Results of the Gaze Monitoring Experiment In this Appendix, the results of the gaze monitoring and fixations determination ex- periment are included. The experiment has been conducted adjunct to the evaluation of the head-eye input device described in Chapter 7. A total of 18 subjects viewed 3 different images for 30 seconds each, and the gaze path and fixation locations have been determined automatically using the algorithm in Figure 5.6. The first photo- graph used is from “50 Favorite Rooms by Frank Lloyd Wright” by Diane Maddez, and the remaining two photographs are from the WWW site http://www.photo .net/. Table A.1 includes number of fixations, average, minimum and maximum fixation duration for each of 18 subjects. Figures A.1 through A.9 give gaze path and fixa- tion points for each of three images. Gaze path and fixations are superimposed on the image that the subject viewed. The white line represents the gaze path. Cray 189 Sill); H) ...,—— {—— 200i 200. 200: 200: 200- 200.? 217w )t 200'. 200.5 200‘. 20M 201] 2012 2013 201.5 2016 2017 2018. Table A; algOrithn dOts ShOh Confleets subject # of fixations, avg. min. & max. duration (msec) ID image 0 image 1 image 2 2000 53 864 112 3800 57 734 115 2401 54 789 140 2991 2001 23 558 123 1690 27 447 115 1255 26 310 102 851 2002 23 361 117 1544 16 580 117 2669 17 341 107 939 2003 31 547 116 2394 27 300 121 623 31 316 115 839 2004 31 398 109 2097 44 433 108 1438 40 411 111 1306 2005 18 218 100 438 24 284 104 1083 31 252 30 761 2006 30 441 102 1386 11 239 109 565 31 291 105 882 2007 34 455 79 2022 31 378 117 1150 13 268 116 876 2008 39 490 102 1729 36 286 101 662 38 430 126 1426 2009 37 322 101 963 33 345 100 1181 27 238 97 595 2010 19 964 236 3118 28 783 220 4073 20 571 123 1535 2011 46 476 109 1251 45 544 110 1504 50 413 72 1727 2012 19 294 110 885 33 234 41 573 23 250 100 660 2013 18 364 117 816 35 311 104 1042 27 339 102 922 2015 41 510 102 1529 24 390 122 1015 23 356 125 1471 2016 40 509 100 1376 31 354 115 1167 35 469 117 1905 2017 24 257 110 835 13 226 104 593 41 312 101 943 2018 32 761 108 2711 43 461 105 1305 35 465 102 1670 Table A.1: Number and durations of fixations measured by the gaze-determination algorithm dots show fixation locations, and the dot size indicates fixation duration. Black line connects consecutive fixation locations. 190 gi§.1 “E It. subj-2000 subj-2001 Figure A.1: Gaze path and fixation points, subj-2000 and subj-2001, images 0, 1, 2 191 subj-2002 subj-2003 Figure A.2: Gaze path and fixation points, subj-2002 and subj-2003, images 0, 1, 2 192 subj—2004 subj-2005 Figure A.3: Gaze path and fixation points, subj-2004 and subj-2005, images 0, 1, 2 193 f -T\ subj-2006 sub j-2007 Figure A.4: Gaze path and fixation points, subj-2006 and subj-2007, images 0, 1, 2 194 sub j-2008 sub j-2009 Figure A.5: Gaze path and fixation points, subj-2008 and subj-2009, images 0, 1, 2 195 subj-2010 subj-2011 Figure A.6: Gaze path and fixation points, subj—2010 and subj—2011, images 0, 1, 2 196 subj—2012 subj-2013 Figure A.7: Gaze path and fixation points, subj-2012 and subj-2013, images 0, 1, 2 197 subj-2015 subj—2016 Figure A.8: Gaze path and fixation points, subj-2015 and subj-2016, images 0, 1, 2 198 subj-2017 subj-2018 Figure A.9: Gaze path and fixation points, subj—2017 and subj-2018, mages 0, 1, 2 199 Appendix B Task T1 Complete Results Table B.1 gives the squared error from the target curve for each subject and curve, for the two sessions done one week appart. The screen coordinates were normalized to 0—1 range. Figures B.1 through B.19 give the comparison of the target curve and the cursor path shapes in the two sessions, for each subject. 200 Subj. Session 1, averaged squared error Session 2, averaged squared error ID c0 c1 c2 I sum #pnt c0 c1 c2 I sum #pnt 00 0.0787 0.0447 0.0578 0.1812 3563 0.0325 0.0266 0.0569 0.1160 4139 01 0.0579 0.0349 0.0594 0.1522 3213 0.0360 0.0456 0.0588 0.1404 2778 02 - 0.1053 0.1248 0.0708 0.3009 2986 0.0886 0.0767 0.0776 0.2429 3257 03 0.0620 0.0577 0.0596 0.1793 4279 0.0781 0.0622 0.0541 0.1944 3420 04 0.0491 0.0321 0.0770 0.1582 3571 0.0705 0.0587 0.0639 0.1931 2884 05 0.1000 0.1514 0.0557 0.3071I 3064 0.1081 0.0783 0.0603 0.2467 2197 06 0.0627 0.0493 0.0840 0.1960 3809 0.0414 0.0766 0.0575 0.1755 2644 07 0.0695 0.1337 0.0679 0.2729 3205 0.0679 0.1226 0.0603 0.2508 1546 08 0.0778 0.0691 0.0795 0.2264 3314 0.0616 0.0574 0.0637 0.1827 3786 09 0.0344 0.0467 0.0553 0.13641 4356 0.0345 0.0311 0.0443 0.1099 3411 10 0.1501 0.0831 0.0683 0.3015 2674 0.0872 0.0689 0.0588 0.2149 2722 11 0.0718 0.0818 0.0690 0.2226 3204 0.0495 0.0422 0.0601 0.1518 3980 12 0.0902 0.1366 0.0553 0.2821 3098 0.0254 0.0263 0.0566 0.1083l 3200 13 0.0612 0.0477 0.0565 0.1654 3924 0.0670 0.1054 0.0724 0.2448 2444 15 0.0363 0.0377 0.0661 0.1401 3520 0.0420 0.0412 0.0474 0.1306 2346 16 0.0563 0.0439 0.0705 0.1707 4908 0.1313 0.0825 0.0719 0.2857I 2615 17 0.0672 0.0712 0.0622 0.2006 4159 0.0485 0.0994 0.0574 0.2053 2755 18 0.0715 0.0926 0.0596 0.2237 3242 0.0377 0.0700 0.0586 0.1663 3073 20 subject is the author 0.0281 0.0213 0.0584 0.1078 3807 Sum 1.3020 1.3390 1.1745 3.8173 64089 1.1078 1.1717 1.0806 3.3601 53197 Avg. 0.0723 0.0744 0.0653 0.2121 3560 0.0615 0.0651 0.0600 0.1867 2955 Percentage of improvement: 15% 12% 8% 12% j - minimal overall squared error value in each session I - maximal overall squared error value in each session Table B.1: Task T1: average squared error for each subject 201 Screen coord. in time. cum .0 rouowmg. 30011000. wosqenor 0.070727 Susan coord. In time cum .0 Wig. mm. manure! 0.032405 o 0.2 0.4 0.3 on 1 x woo" Cm Sam coord. in time: curve 01 um. sunLaooo, manna! 0.026666 Task curve #— V screen emanate .0 O 0 0.2 0. 4 0.6 0.8 1 0 X 3670." coordinate Semen coord. In limo: cum '2 Ionovllno. 30011000. avasqenor 0.057774 Screen coord. In time: curve 02 Mowing. sunLaooo, monomer 0.066822 Task curve — Y screen mm“ .0 O! 0 02 0.4 0.0 0.5 I X screen coordnalo Figure 3.1: Task T1, subject 00, cursor paths for all three curves 202 V ICIW‘ (3me Y screen coordule Screen coord. in m. curve 00 renew-no. subLZOO1. Ivo.:q.error 0 057676 O 0.2 0,4 0.6 0.8 1 X screen cooninm Screen ooord In Mme: curve l1 rationing. subLZOOI. masqerror 0.084944 Task curve 7 , Tm curve ,- — 1 me —— sun point - End poi - l M 2 on 0.4 02 c 0 02 0.4 0.6 0.! 1 X screen mann- V screen coordham Y screen coordinate V 312100" Wm” 0.4 0.6 X ”.9" manur- OJ 1 Screen coord. in m: cum I! Who. “3001.2vgsamror 0.045627 0.4 0,8 x screen coorulnare Figure B.2: Task T1, subject 01, cursor paths for all three curves 203 Screen coon! in lime lame to following. subL2002. “9.50.9170! 0 105301 Screen word in lime curve 00 «allowing. subLSOOZ, m.eq.error 0.05mi Figure B.3: Task T1, subject 02, cursor paths for all three curves 204 Y screen coordheln V lemon commune V screen couture Screen coord. In time curve 00 IoIIowIng. wagons. avaisqenor 0.062047 \\ " - ¥ «p-wily . O $344.4» . "‘.é V / .. ~97 A . 0.5 I V screen coormaw V 56700" denlh V screen oooumare Screen court In Ilme: curve IO IoIIoMng. wam. mammal 0,0780” 0.4 05 x screen coordlnale 0.! 1 Figure B.4: Task T1, subject 03, cursor paths for all three curves V WOO" defll‘l 0.6 .0 . 0 02 0.4 0.8 0.3 I X screen coordinate Screen coord. in m. curve 01 IoIIoMno. suDL2004. avgaqmror 0.032I16 Task curve -. — ear I 0 02 0.4 0.8 0.! 1 X ecreen coordnare Screen coord. In no: curve I2 IoIIoMna. “1112004. avq.eq.errcr 0.077051 Task curve 7 O 02 0.4 0. 6 0‘ 8 I X screen coordnaIe Y 561000 coordnale VICIWWO’IID Screen coord. In lime: curve 02 Iouowlng. 31113004, macerror 0.00385 Task curve ~ 7 Y screen emanate Figure B.5: Task T1, subject 04, cursor paths for all three curves 206 V screen coordinate V screen 000ml” V screen convfirele Screen coon: In line. curve 00 renewing. 50mm. avgsqerror 0.100000 Screen coon! In time: curve 00 IcIIoMng. subLSM. avg.eq.error 0.10012! Teek curve — Teen curve H 1 Ueer data 7 Sran poinI - 90'" ' V screen coordInaIe 0 0.2 0,4 06 0.8 I X screen emanate Screen coord. In m: curve 0‘ IoIIowing.sub1_2005. avesqerror 0.151451 Y screen coorrhate 0 02 0.4 0.0 0.3 1 X screen coordnare fimnn um en” 1m: V screen CW" Figure B.6: Task T1, subject 05, cursor paths for all three curves 207 V screen coordnare Y screen coordInaIe V screen coorarab Screen coord. In Irne. curve 00 lemming, sub;_2006. avgsqenor 0.062577 Task GUNS .7. O 0.2 O. 4 0.6 0‘! I x screen coorflrnsle Screen count 10 time curve 01 renewing, woLzoos. avosqerror 0.049016 Qty-mm ' ' nun-55008. V screen coorrlnale 0 0.2 0.4 0.6 0.8 I X W00“ 13me Screen coord. in time: curve I1 IoIIowing. M3006. avenerror omeeos Vscreencoormale Y EM main Figure B.7: Task T1, subject 06, cursor paths for all three curves 208 V screen coordinale 1 (cf—._m Userdara A. , ’ \\ Sm- Scrcencooru' ' vwaurr 0 0.2 0.4 0 6 0.0 1 X screen coordnete Screen coon: in time curve 01 following. “112007. avg.sq.error 0.133682 Task curve v‘ 0.4 02 ‘/ ' (. \‘\ o as o 02 0,4 0 0a 1 Vmcmuh O C SDI-an mt! euhi 9007 0.0679‘2 Task curve F User can ¥ Stan porn! End point O 0.2 O 4 0 6 0.8 1 X 50180" coorolnale Y 50100" couture Y screen coord‘lah Screen coord. In IIme curve '0 renewing. subL3007. mseencr 0.067957 0 0.2 0.4 0.6 0.! X $7000 Cmflib Screen coord. In time: CW. '1 W110, “113“”. ”9341."!!! 0.1m T“ CW. — O 0 .2 0.4 0.6 0.8 X 36790" cocrrlnaIe Figure B.8: Task T 1, subject 07, cursor paths for all three curves 209 Y screen cooranale um mm '- .— Task curve , *# ser dam W Screen coord. In rIme: curve 41 ram. suanooe. avgseerror 0.089075 Task curve » Screen coord. in m: curve :2 Memo. subLZOOI. avo.sq.errcr 0.07m Task curve _- éJeer Ia — Xscreen coordinate Y screen coordnaIe V screen comma- Y .6700" cooranale ficmmmm ’ ' _ euflh‘lm Tak curve 7 dare 0.4 0.8 0.8 1 X screen coordinate Screen word in lime; curve 02 Idloerlng. subL3000. avgsqerror 0.003740 User cars Yak curve ~ —— Start l - 0.8 1 Figure B.9: Task T1, subject 08, cursor paths for all three curves Y screen WWII“ V screen coorrfinale v screen cooranale 0 0.2 04 06 . . 0.8 1 X screen cooraInaIe m and euhl mm 0055352 TM GUN! — 1 USU duh fl Sm pd - End 0.8 0.8 0.4 ‘ v 02 i 0 0 0.2 0.4 0 G 0.0 1 X saeen ooordnale V WOOD WWI“! V screen ooordInaIe v screen coordnaIe Screen coord. In time: curve 00 renewing. subLaOOO. lvg.eq.error 01134467 0 0.2 0.4 0.6 0.8 1 X ”00" coordinale Screen coord. In time: curve 01 renewing. suuLaooe. evoseerror 01131075 Scrmmvr!‘ ' Figure B.10: Task T1, subject 09, cursor paths for all three curves Screen cooru In lime curve 00 renew-n9. subL2010r avu.sq.enor 0 150077 Screen and ' ‘ ' _ «Ir-Limo Task curve 7 Task curve . 7 . User cara Y screen coordinate Y screen coorttIaIe 0 0.2 0.4 0.8 0.8 1 0 0.2 0.4 0.6 0.0 1 X screen cooninale X screen coordnale Screen coord. In lime: curve M IoIIovnng. subLZOIO. avgsqerror 0.083080 Screen coord. In IIme: curve 01 IoIIowIng. subLSOIO. avg.sq.error 0.080900 Task curve ~ 7 — 1 . 7 A s - User den 7 / ~ ~ . .. m - -1 0.0 2 E g 0.6 E 0 8 0-4 3 > > 0.2 o . - 0 0.2 0.4 0.6 0.0 1 x screen coordnare Screen coord. in time; curve 02 IoIIowIng. suoLZOIO. avgsqerror nooseso Screen coord. ‘ _ mhl_3010, _ . , " Task curve 7 Task curve 7 7 1 User data 7 Stan point .\ d 0.6 ' 0.4 , f . \ 7 0.2 \ . " o 0 0.2 OJ 0.6 0.0 1 0 0.2 0.0 0.6 0.6 1 X screen coorvlnale X screen coordnare Figure 8.11: Task T1, subject 10, cursor paths for all three curves 212 V screen coordnale V screen coonhale Y screen coormale Screen word In lime: curve IO Wm. eubL301 I. avgsqerror 0049521 Task curve 77 1 User data — Stan poinl - End point - 0.8 E 3 0.5 ,' , u r i l 8 0.4 . > xx , , _ 0.2 \/—/ 0 0 02 0.4 016 0‘3 1 X ecreen ooordhau Screen coord. In Ilme: curve 01 renewing. eurLaO‘l‘l. augment: 0.04m Y screen «name 9 or sum mm: . u - mu 9m 1 0.080060 Task curve User data Sm point End t 8 ( g \ 1 7 u E > I 0 02 0.4 0.6 0 l 1 x ween coordnme Figure B.12: Task T1, subject 11, cursor paths for all three curves 213 Y screen coordnahe Y screen coon-hare Screen 000m in rune. curve IO lollounna, subL2012. avgisqenor 0090214 Screen coerce in u’mu ' . :uhLamz, Task CUWO 7 Y screen coordinate Screen coorri In time: curve 01 renewing. sueL3012. murmur 0.026274 Task curve 7 User dam 7 Sun point - pd!“ - Y screen coorthare p O 0‘2 0 , 0 02 0.4 0,6 0 8 1 X screen coordnate Screen comm ' ' :uhl 1n19 WNW- Y screen coordnare 0 0.2 OJ 0.8 0.8 1 x ween cooransre Figure B.13: Task T1, subject 12, cursor paths for all three curves 214 3 § g on E 8 > 02 Screen coord ‘ , Int-L201; _ , ...w a... Screen cooni in lime' curve 00 ioilowino, subL3013. avgsqerror 0101541 User data 7 Stan porni End pornr Task curve 7 7 Taskcurve 777 /’ 0 02 0.4 0.6 0.! 1 X screen m Screen count in m- curve r1 ieiiowlng. eubL2013, avu.sq.error 0.047564 O 0.2 0.4 0.6 0.3 1 0 0,2 0.4 0.6 0.6 1 X screen coerflnare X screen coordinate Screen word ' ' euhLQOI’l , , ., ,iw ,v Screen (zoom ‘ euhi 1011 0072‘” Task curve 77 7 Task curve 777 User data 7 7 1 User date 7 Stan point Stan poinl End point _ - \ \‘Yb ,- ‘\ "SW" ‘3; r. 'r V screen coordhaie _o O ‘ 0.2 " 0 K 0 0 2 0 I 0.6 0 8 1 O 02 0.4 0 6 0 e I X screen coordinate X screen coordnare Figure B.14: Task T1, subject 13, cursor paths for all three curves 215 V screen coordinate Y screen coormaie Screen cocni in time curve 00 ioiiawinq. subLZOTS, avgsqvener 0086270 0 0.2 0 4 0.6 _ 0,0 1 X screen coorarme Screen coord. In time: curve N following. suuL2015, avqsqenor 0 037670 Task curve 7 a.“ Userdaia 7 unpoinr - . Screen and gun 7n1'. Task curve Ueerdan 7 Stan point \ poini 1 /\ l 0 0.2 0.4 0.6 0.6 1 X screen coordinate VI screen coordnau Y screen coerdnare Y screen coorhle Screen coord. in am curve 0010Mowing. subLaois, avasqenor 0.041964 04 08 . . 0.8 1 X screen coordinate Screen coord. in urne; curve 01 renew-no. subLSOI 5. evo.eq.eiror 0.001237 \ 1 , . 0 Q2 0.4 0.6 0‘0 1 X screen WHEN Sanenmotd euhi 1015 ' Taek curve 7 7 77 Figure B.15: Task T1, subject 15, cursor paths for all three curves 216 Y screen coordinate Y screen coordinate Y screen coordinate 08 0.6 04 0.2 - 08 r 0.6 0.2 Screen word in time curve 00 ioiiowing, subL2016, avgsq error 0 056320 Task curve - User data 7— r , 1 0 02 0 4 O 6 0.8 1 x screen coordinate Screen coord in time curve .1 iollowing. subj_2016. avg sq error 0.043933 Y 7 Task curve User data 7 4 Start point - End pornt - P 1 0 0.2 0.4 06 0.8 1 X $619811 coordinate Screen coord in time curve 82 following. subLZOtG. avgsqerror 0070508 fl Task curve 7 7 - p User data r Start point - /\ End poo"! - \ {n} | V \. /\“ p 0 0'2 0 4 0,6 0 8 1 X screen coordinate Y screen coordinate 0.8 0.6 Y screen coordinate 0.2 0.8 Y screen coordinate 0.2 04» 0.6 r 0.4 L Screen coord. in time: curve to following, subLSOiG. avgsqerror 0 131280 Task curve -- 77 User data 7 r -”-”_ End poi - / I A 0.4 0.6 0.8 1 X screen coordinate Screen coord. in time curve 01 iotiowino. 5013016. avgsqorror 0.082469 Task curve User data - Start point - End pornt - A d 0.4 0.6 0.8 1 x screen coordinate Screen coord. in time: curve 02 following, subL3016. avosqerror 0 071872 t Task curve ~77 User data '7 4 Start point En f? A A 0.2 0.4 0.6 0.8 1 X screen coordinate Figure B.16: Task T1, subject 16, cursor paths for all three curves 217 Y screen coorrheb Y screen coorfime 0.2 Screen coord. in lme curve 00 following. subLZOfI avgsqenor 0.067211 0.4 0.8 X screen coordinate Screen ooora in time' curve 01 following. suh1_2017. avgsqerror 0 071150 Task curve 7 Screen coord. In time: curve '2 renewing. eutJL2017, evgleqerror 0.062245 Task curve —— - User data 7 7 Start point - rid point - 0.! 1 Y screen coorolriate Screen coord. In time: curve .0 following. wbj_3017. avgsqerror 0.040518 Task curve 7 —7 04 06 , . 0.6 1 X ween coordinate Screen coord. In time curve 01 following. subL3017. avguerrer 0.099429 0.8 9 II 8 g 0.5 8 0.4 > 02 o ‘ 7 a , 0 0.2 0.0 0.6 0.0 1 X screen coordinate ermn mm mm 1011 0 057450 Task curve 7 7 1 User data 7 Y screen coordnata 9 0| Start point End - ~ 0.8 I Figure B.17: Task T1, subject 17, cursor paths for all three curves 218 Y screen cooranate Y ”’90" coortlnate Screen ooord ln time , :unhmm 0.0715“ Task curve ’ 1 User data Start potnl End - . 0.4 0.6 X screen coordmale Screen ooord In time: curve It following. subLzow. avgeq‘error 0.092592 0 0.2 0.4 0 6 X screen coordinate Screen coord. in time. curve 02 following. th2010. avgwerror 0.050000 Task curve 7-— 1 Ueerdata fi Stan point - End point - \ , 0.8 0.6 7 0.4 02 1 ‘\ 0 0 0.2 0.4 . 0.0 1 X screen coordinate V $709" coordnate Y screen coordinate Y screen coordnate fianen mad euhL‘lm A ,w 06 . . 0.0 1 X screen coordinate Screen coord. in time: curve It followtng. suhLSOls. avgsqrerrdr 0.069964 0 0.2 0 4 0,! I . .6 X screen coordinate ' nu” 'IDI I! ...,-am . 6 X screen coordinate Figure B.18: Task T1, subject 18, cursor paths for all three curves 219 Screen coord. in time curve 00 following. subLZO20. avgsqerror 0 028092 f i TaSk Curve 7 77 7 1 User data 4 l Start pornt - l End pomt - 2 a E D E c . e \ e r.‘ ‘1 . ,_ 3 - ‘ : o l A A i i 0 0.2 04 0.6 0.8 1 X screen c00rdinate Screen coord. in time. curve It followrng. subL2020. avgsqerror 0.021313 T ' Task curve 1 t 4 0.8 r g . (I .E D g 0.6» U C O 9 8 0.4 > >. 0.2 > o A O 0.2 0.4 0.6 0.3 1 Xscreen coordnate SCreen coord. in time curve 02 following. subL2020. avg.sq.error 0.058447 ' 7 Task curve - 1 0 User data 7777 4 Start point - End point fl 0.8 r 2 . a .' s t , .. _ , , g 05 »\ ) . fl . r U ‘ . 5; 0.4 » i ‘ < > ii i < ’ V \l 0.2 » é o A— A— 0 0.2 0 4 0.6 0.8 1 X screen coordinate Figure B.19: Task T1, subject 20, cursor paths for all three curves 220 Appendix C Task T2 Complete Results Table C.1 gives the button selection accuracy. For each session, the first three columns correspond to the first, second and third correct selection. The fourth column gives the number of incorrectly selected buttons in the third attempt. The remaining three columns give the average squared error from the linear regression line by F itts’ law. Table C2 gives the selection time statistics. The first four columns for each session give the average total time needed to select a button in the first, second, and third attempt, and the time when after the third attempt the selection was incorrect. The average time generally increases with the number of attempts, which is expected, since the user needs to make more than one selection. Also shown in the Table are total elapsed times for each trial and for all the three trials. As the trials progress, the elapsed times didn’t decrease. However, when compared across the sessions, both the individual trial times and the total time decreased significantly. Figures C.1 through 019 give the selection times vs. button distance by Fitts’ law and the cursor paths for each target button for all 18 subjects and the author. 221 Session 1 Session 2 Selection Avg.sq.err of lin. Selection Avg.sq.err of lin. Sub j correctness regression line correctness regression line ID 1 2 3 not E0 E172 M 1 2 3 not E0 T172 M 00 4 3 2 6 1035 951 - 23 4 3 0 521 340 32 01 18 6 3 3 628 661 57 23 6 0 1 391 296 33 02 7 6 2 15 1108 574 36 23 1 1 5 825 216 49 03 20 7 2 1 408 443 105 27 2 1 0 1292 1474 318 04 13 5 5 7 812 558 39 19 2 2 7 315 335 37 05 23 1 4 2 1460 561 110 8 5 5 12 798 518 38 06 9 5 6 10 318 794 216 18 7 4 1 1527 452 47 07 8 12 5 5 364 477 468 18 6 4 2 363 364 108 08 8 12 5 5 1095 588 31 22 3 4 1 192 875 25 09 18 4 4 3 2738 1916 244 24 6 0 O 314 784 68 10 9 6 5 10 384 353 79 15 10 3 2 1036 1794 60 11 21 7 1 1 235 353 21 24 5 1 0 563 262 17 12 2 7 8 13 397 490 219 25 3 2 0 562 220 80 13 2 5 3 20 850 1089 52 4 5 1 20 170 711 112 15 20 6 2 2 666 303 27 23 1 l 5 148 1462 68 16 7 5 8 10 190 383 53 15 6 4 5 370 324 55 17 9 11 6 4 797 200 41 7 4 2 17 97 1206 302 18 19 7 3 1 400 1277 36 20 2 2 6 1001 217 31 20 subject is the author 28 0 l 1 128 435 34 Avg 12.1 6.4 4.1 6.6 18.8 4.3 2.2 4.7 % 41 22 14 23 63 14 7 16 77% correct, 23% wrong 84% correct, 16% wrong Table (3.1: Task T2: button selection accuracy statistics 222 Session 1 Session 2 Avg. time for Trial Tot. Avg. time for Trial Tot. Subj selection (sec) time (see) time selection (see) time (sec) time ID 1 2 3 not T0 T1 T2 (sec) 1 2 3 not T0 T1 T2 (sec) 00 6.6 12.1 14.9 14.3 76 68 33 178 4.4 8.0 4.4 - 52 54 39 145 01 6.3 8.0 6.6 5.2 71 79 46 197 3.4 6.5 - 5.9 45 36 42 124 02 8.5 14.4 11.1 11.9 111 142 92 346 4.5 11.0 4.1 1.5 58 33 33 125 03 4.6 9.3 14.3 6.3 63 63 64 190 6.4 5.6 5.7 - 45 56 86 189 04 7.5 11.1 11.1 7.9 85 73 104 264 5.8 7.5 8.9 2.6 41 60 59 161 05 7.1 10.7 11.7 8.2 102 46 87 236 3.7 5.5 10.8 10.0 76 110 44 231 06 4.1 7.3 13.7 8.6 40 87 114 242 5.8 11.9 16.2 8.0 116 79 64 260 07 5.0 7.3 7.8 5.7 58 72 65 195 4.6 6.8 9.1 9.9 59 42 77 180 08 4.1 4.9 8.9 23.7 118 87 49 254 4.7 14.8 11.3 1.5 40 56 96 194 09 13.6 17.8 42.6 6.4 196 169 139 505 4.1 4.2 - - 32 35 55 123 10 3.3 5.7 5.3 9.1 53 43 83 181 5.7 7.4 41.8 30.3 80 174 91 346 11 2.9 5.9 5.7 5.1 31 35 46 112 3.6 8.1 7.7 — 53 47 34 135 12 3.5 5.7 12.5 6.0 80 88 56 225 3.8 6.2 4.9 - 51 34 38 124 13 30.4 13.2 31.7 11.8 176 224 55 457 6.1 7.0 39.5 2.3 28 95 20 145 15 4.4 5.4 6311.4 65 44 45 155 7.3 4.6 2.2 3.4 32 98 60 190 16 3.2 4.1 6.1 7.2 33 43 86 163 4.4 3.8 6.2 3.0 39 58 30 128 17 4.2 5.3 5.7 6.3 79 35 40 155 5.1 3.1 5.5 5.0 20 20103 144 18 7.4 7.3 10.6 3.3 46 83 97 227 5.4 3.1 4.4 2.1 53 47 34 135 20 subject is the author 3.2 0.0 3.2 2.1 22 37 34 94 Avg 7.0 8.6 12.6 8.8 I 82 82 72 I 238 4.9 6.9 10.1 4.7 51 59 58 164 Table C.2: Task T2: button selection timing statistics 223 Sel.tirnesvs astanoebetweenbimbyf-‘me‘lawforMdeiuJOOO Head-E e—MoutthalO —-— Heed-Eye— cum. mats1<2 + MouseJn-leO—Z 77 1S . -1 ~05 0 ">le (DIS + ”1 "' \/‘ 0.375 0.5 0.625 0.75 0.875 1 preeneoordlnele 0.125 0.25 Xeereeneoordlnete ‘)/ M .. ”A .. . 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 X screeneoordhete Seltirtieeu“ nu uwuwmmm o-awmtulowcnrn Head-E e-Moum.tm10 —~— Head-Eye-Kmmmnate 1-2 —-~— Meme. trial: 02 - y/ 1’17 . . . n _ . ..g.-.‘ - wkififOI7-r-7Q77 .1 0.5 warms. 1m " 2 2.5 3 3.5 chit-innum- Y screen MUN" 0 *7 cg» _\_ I .r. ., g i O 0.125 0.25 0.375 0.5 0.625 "- L » 1_ 0.75 0.875 1 X W coordhlte 0.375 Y screen coordinate 0.25 0.125 0 / E 1 ~ 0 0.125 0.25 0.375 0.5 0.025 0.75 0.375 1 chreenooordheie Yween ooordhete 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 X screen ooordlnete Figure C.1: Task T2, subject 00, Fitts’ law time-distance plots and cursor paths for each button 224 Time in seoonrt V 5019“ ooordi rate V screen ooordhete Yscreen eoordhale Set timeevs disumebetweenbutmbynns'lewforwgndJubLzom Head-E 00171. tnelO —~— Head-Eye .tnals 1-2 7+— ‘7 else. trials 02 - o—Awuhmmuuw -1 «.\ _— 1 1 1 i A F . o l 1 7 7" 0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 0 O 0.125 025 0.375 0.5 0.625 0.75 0.375 1 Xsereenooor to button Y screen coordinate T‘lrrieinseoonds Y screen coordhete Y screen main“ 225 Sel.nnreen.detarioebetwembmtorisbyFnfihwforMmd.m_3001 Head-E e—Moum, rm 0 HeadvEye-Xdoum. wide 12 W Morse, trials 02 o-‘Nuburaiflmro -l 1.5 2 3.5 2.5 3 Q h eke ', , ' n-N,_3001. mu 0 l | 885588181538 .9 vi 1 1 1 .ué’ii-‘gzyegl "_‘ _ . / 0.375 77 0.25 0.125 l‘\ 3. 7' "J A -- 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xsereenooordlnete 0.825 0. 75 coordinate 0 0 0.125 0.25 0.375 0.5 0.625 075 0.075 1 ear I. Figure C.2: Task T2, subject 01, Fitts’ law time-distance plots and cursor paths for each V screen coordhate Y screen coordriate Yweenooordhete Set tirruvs Gswioebetweeriumombyfitts’hwforaxflwid.subl_2002 SettmesvsdistancebetwembmnmebyFme'hwfongnd.ufl_3002 1 0 ‘7 ‘ i ’ \ 91/ 0 0.125 025 0.375 0.5 0.625 0.75 X screen coordinne Y screen coordinate Yecreencoordinate Y screen coordinate 19 Heed-Wm. trIaIO 77-— 19 HeadE ammo —— 18 Heed~Ey cum. trials 1-2 7-7 ‘3 Heed—E .viel: 1-2 —+— 17 Mousetnals 0-2 - ,7 Home tn ~~~~~~ 16 - , ' 16 15 15 14 14 re . 13 12 ‘ 12 11 - ' 11 _ . 1g c 1g E a g a . >- 7 . F 7 6 6 5 5 4 4 3 _c ’" 3 2 taxi” . 1. a _ ' 2 . u 1 " _ . . s . . - .g ,, ' . {.3 «are. . . n 4 1 t 7”,.-.«7r' . ..- _ .._.,..,..-_. ~v-‘f+< 1 a I A c ‘5" ' ~1 -O.5 0 0.5 1 1.5 2 25 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 iog_2(DfS-¢II21'“ thimnnebe mumsunw“ chum-In ._x" I ~ 7 1 1 0 0.125 0.25 0.375 0.5 0.025 0.75 0.875 Xscreencoordhete 0.375 0.25 0.125 0 ‘2‘: 0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1 Xsaeencoordhete 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xscreenooordinate Figure 0.3: Task T2, subject 02, Fitts’ law time-distance plots and cursor paths for each button 226 Sol lime-vs dsnneabuwoenbmmbyfim'hwtormmd.m2003 ‘9 HemE othnaJO —-— .mals1-2 ,... ,mahO-Z — 18 Head-Eye oath Mouse Tlmatnloomds 6:5; O—‘NUAU‘OVQW Time In swarm 6 u- .9 o- m r0 N in Yeoman coordinate Vinson coordinate Xmeoordmh V saaan coordinate .12“— ’ -r “I ‘ - J $3.35 I \ Vlaoonooordlnm 5 \ r 0.375 0.5 0.625 0.75 0.875 X screen ooordhah & 0 0125 025 0.375 0.5 0.625 075 proonooordlnnla Figure C.4: Task T2, subject 03, Fitts’ law time-distance plots and cursor paths for each button 227 501 have dsmbemembumbyfim' hw'oraxlw. 00013005 Head-E mouth, halo—— 1: HmeKAomhI with 1-2 No— o-‘Muuu-owmro -1 -0.5 0 0.5 1 1.5 2 2.5 3 15 tom (as + m "- u‘z 1‘ I \ 0 0.125 0.25 0.375 0.5 0.625 0.15 0.875 Xweonooordhnh luffim. Mil 1 38 | ‘ 1 3; 2 \ 0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1 X sum enormo- o 5 N 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 X screen ooonihaw Solflmesvs awmmwauwwmwAmjoor Sd.mnasvs.d1stancnbemoer1mubyFm'hwbrMWJmL3004 2 $3 Head-E woum. IO + ,3 HeadvE M01191, 1mm —-— 18 Head-Eye—XAoum,m 1-2 --~—— 18 Head—5y 01.11131: 1-2 ~0—- 17 : Mouse.ma1so-2 17 Mommy) , -~-- 16 16 15 15 14 14 13 13 g 12 . ., § 12 ' g 11 g 11 10 - 10 E 3 E 3 w 1: 7 .— 7 . / 6 6 . . ..5’/ . 5 5 /'/I ' .1 4 4 ' .A_-—’ -. ' _ 3 3 - ‘,//’ . 2 2 . I I . 1 1 y - .1 7 - >1 '1 < .r‘sf-w ‘———-. J" « 1 O 0 I: -1 ~05 0 0.5 1 1.5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Iog_2 (DIS + 12). " qu_2 (DIS . m1 = 1. team .12. V 5am coordinate Yscreen coordm o ‘ ’1' 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xscromcoomle Vscreencoordlnsns 0 ~ 1 ’ 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 X screen coordhslo V screen commute Y screen coord'nltc V screen coordhala .. 4:- 0125 0.25 0.375 0.5 0.625 0.75 0.675 1 coordina 0 Figure C.5: Task T2, subject 04, Fitts’ law time-distance plots and cursor paths for each button 228 Sdumesvs dismbemembmmbyfim‘hMoraxadem‘Lzoos Solunusvsdsmbammmbyfim‘hwforflxflam.sm_mos $3 . Head-E 0410001,:er + 15 Head-E o—Momn. 1mm —-— 13 Heed—Eye- oum. Inals 1-2 -« o— 18 W cum. vials 1-2 ~o~ ,7 nommaisoz - 17 Mouse. 1:111:02 -- - 16 15 ' 1s _ 1s 14 14 3 Ttrnohseconds 8"§“ Tlmainuconds 836‘ . 9 ____._’_’_‘____1 9 ' 8 t - / a - 7 - r : _'/»/‘" 7 6 - 6 5 . . 5 4 f- 4 3 “ ,,./ .. 3 _ .-o . - 2 ”/4" -* 2 . ,a-‘H- ' a , .. . ~ g, - w. -... WW. ,1 -1 —0.5 0 0 5 1 1,5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 loq_2(DIS+1/2)."‘ ‘ shh-mm‘n Iog__2(DlS+1f21" ‘ Git-manga- “ ", , um;_2005.ma10 1 0.875 075 a “"‘ s I g 0.525 g \ g 05 a g g 0.375 “ 7 g - , . ‘ , > b ‘ . . . . ; ’ > : . a. ’ . 025 _' . s, " ‘ ‘ . ,. ~ . ' ~. 0125 - . , . ...... ‘ , J . 2 .. , *. Is 1‘ ' . . , l .- . »__/,__,.,..,i_ ‘ A 1 . x - o - 1, 1 - «.1 o 2 1 x 0 0.125 025 0.375 0.5 0.825 0.75 0.875 1 0.5 0.625 Xscreencoordhele Xsaeen "' ‘ E, , ' CIhLZOOS. VIII 1 8838828828 Vscraan coordinate . 1 1 I . 1 Vscreenooordhsu Xsaemcoordm Xscremcoorumm 0.75 if: V screen commute Y screen coordheb 0 0| 0.375 025 0,125 0 " _ - A o ’ 0 0.125 025 0.375 0.5 0.625 0.75 0.875 1 0 0125 025 0.375 0.5 0.625 0.75 0.075 1 Xscreencoordinete Xscreenooomh-te Figure C.6: Task T2, subject 05, Fitts’ law time-distance plots and cursor paths for each button 229 Y 861000 mvinb Y screen ooordhnla Vlaoan coordinate Sol limesv: wmmmwmuwmmeuoLm Sal arr-av: ammmwmmwwmwauqdaooe lou_2 (DIS 91R), "' semen coordinate 0 A 11.1 ’ 0 0.125 0.25 0.375 0.5 xmm‘nfl. V ween 00am” 33 Heidi e—Moum.tnal0 —-— $3 Headi ouerIaIO —~— 13 Head-Eye- oum.mals1-2 ,_._ 13 fim-Eyo- m,vinl31-2 —~o— 17 Manama“ 02 -~ 17 MouseJflaIsO—Z .... 16 16 ' 15 . 14 fi 13 _ 12 ' S " g E 10 c B " ,1 - 9 £3 g: s / f 6 5 5 4 . 4 3 . 3 2 2 1 1 o " ‘ " 1 o -1 -O 5 0 0,5 1 1 5 2 2.5 3 3.5 - o “All; 0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1 Xmmm «)3? 0.625 0.75 0.875 Xsaoenooovdlmh INN m. 111d 1 Figure C.7: Task T2, subject 06, Fitts’ law time-distance plots and cursor paths for each button 230 Sol unuvs dammmmmwm'mmmgnd.sm_2m7 Sol nmusvs mmmmmmwm'hwmuammm7 $3 Head-E e—Moum. mic _._ $3 Head-E mamrrialo —-— ” Hua-Ew’doum, ma): 1-2 — -~ 18 Hew~E cum, trials 1-2 ~0- ‘7 Mouse. 02 - 17 MousoJmllO-Z- .-._ 16 . 16 15 15 u _ 14 g 13 1 12 12 § 11 . I . g 10 - = 10 . _ 9 - 9 E 5 2 a >- 7 5: 7 . 6 6 5 5 4 d 3 3 2 2 1 1 O 0 r -1 -0 5 0 0 5 1 1 5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 MIND/5.1mm 0 7 ‘ -‘u 7 17 . k 0 A 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 O 0.125 0.25 0.375 0.5 0.825 0.75 0.875 1 Xsaoen 0007mm Xwoonooordhlln u .. a, _ r an 2007, m1 ” {a W 7 1 b1 _— he — o.a75_ b3 ‘ ... g __ 075 i '89 ' ’ be 0825 § i 5 °5 Q \ g 0.375 > ‘\\ >- ,7 H , Ir" 025 ; 1 ~-~--J 0.125 ‘- :.. “ 0 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 X screen mourn o 375 Y screen mum 9 0' Y screen coordlnnla 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0375 05 0.625 0.75 0.875 1 Demon coonflnma X screen mammals Figure C.8: Task T2, subject 07, Fitts’ law time-distance plots and cursor paths for each button 231 sumvsuswmmoenmbymnwhrmw.mzooe “.mwmmmwm'hwmmmmm 1g Head-E cumtrialo —-— $3 Head-E 01m.tr1aI0 -°— 18 Head—Eya- mammals 1-2 »+— 13 W .malH-Z —-- ‘7 Museums 0-2 r 17 M0110. muse-2 » 16 16 15 15 14 _ 14 13 13 - 12 ' 12 ' 11 11 10 10 5 9 5 9 E a E 8 1— 7 1- 7 G 6 5 5 A 4 3 _ 3 2 2 1 .. . o - r . - . 1 -1 -05 0 0.5 1 1.5 2 2.5 3 3.5 -1 ~05 0 0.5 1 1.5 2 2.5 3 3.5 mzmsumm ' Qkhmlmnb- magnum” thunmnnb- Ysaesn coach-to Y screen coordinate 0 0| ‘1 | I 1 0 .i - . - 11. 0 g _ ,; 0 0,125 0.25 0.375 0.5 0.625 0.75 0.87 1 0 0.125 025 0.375 0.5 0.625 0.75 0.075 1 X screen ooovdmlo X L‘ ‘E, , mhmew1 ‘4 ‘5 I“ ”8.111!” m. V screen ooordhcla V sum ooominah o ‘ , 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 b V screen coordinate - r~~«——~«___ *~ Lg o “ L o ' ‘ r 0 0.125 0.25 0.375 0.5 0.625 0.75 0.375 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1 X screen coord lo X mun mouth-tn Figure 09: Task T2, subject 08, Fitts’ law time-distance plots and cursor paths for each button 232 Sunmasvsnsuncabmmmbyfim'hwlorwdeuhLm “.mvsdsumbmmumombym'lawfolmw.uu_m ‘9 Head-E oummmo .,_ $3 Head—EneMo‘m. 1:1an —-— 15 Heal-Eye oum.mals1-2 , -.. 1a Head-Eye- ouh mam-2 ~4— ‘7 Houseman 0-2 —- ‘7 MouseJrImO-Z *~ 15 16 15 . 15 14 14 13 ' ' ' 13 12 (4 12 11 . .ufix—M‘ 1 1o ,5- » :4"- 1o 5 9 __,-—-/”’. 5 9 E a 7 A - ‘ . . g a . 1: 7 r "‘ . F 7 s s j 5 : . ' " - 5 ”if“ 4 . 4 g . - . 3 . . . - . - . . . 2 1 D . . ' ' . c. . T". - ”hr-33'. ' 1 1 . , 1.1.1; ...,..fi’ 0 ' 0 .1 —O.5 O 0.5 1 1.5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 109_2 (DIS. 1m " summon um Iog_2(D/s . m) " - swam-1n “ ‘ 5, ._ r mhi 200911810 0.5 0. X screen coordinat- , " 1 Ft r,/ 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xscroonooovdlnnta Xscmcnooovdhlb ,\ _,_._> 1 L 15 .1 v 0 A r 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 O 0.125 0.25 0.375 0.5 0.625 0.75 0.375 1 Xmooorumn chtom ooordhale Figure C.10: Task T2, subject 09, Fitts’ law time-distance plots and cursor paths for each button 233 Sel. limesvs mmmmwM' lawmaxew.:w1.201o Sol nmw.dismmmsbyfim'hwlotmwd.m_mlo 19 Head-E Mormmlo —-— Oum‘mLZ —-0— memo-2 ~ , 13 HeadAE Time In seconds A .V_).» . 3"" "'- .I. 19 Head—E um.tr'1alo —.—— w "mwfims t: -1 annmnnuin -05 0 mun/Sum .. V onhl 2010, trial 0 w , 01— 02 ._\ 1':f"1 1 \ : 1 ‘11 0 0 0125 025 0.375 0.5 0.625 0.75 0.875 1 X lo nut-3010. 111.1 1 Vsaoon coordinate Yweenooovdlnaia ’ 4.57- ' ' 1' 0.125 0.25 0.375 0.5 0,625 0.75 0.875 1 Xaamooordlmb 0.5 1 1 5 2 2.5 3 3 5 1 1.5 2 2.5 3 3.5 4‘ ' e 1. human (11- IHN N10. Md 0 o ' . 0 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1 Xaaoenooordlmln Figure C.11: Task T2, subject 10, F itts’ law time-distance plots and cursor paths for each button 234 Sol 111-nous ammmmmbynm'hwmmemgon demvsdsmmmmmbyFM'hwvaWJmJOH $311115 Mouth mam —«— 3 H0365 0th halo—.— 1a HeadEEye’XAoumm 1-a1s12 + ,5 News “011111151114 —-— '7 Mouse. mats 0-2 > 17 1s 1s 15 15 14 14 13 13 12 12 11 11 . 1: . . 1: E a _g a . F 7 - 1— 7 s 6 5 s A 4 :1 a 2 2 1 1 . 7 o o " ‘1 41.5 o 0.5 1 1.5 2 2,5 3 3.5 -1 -o.5 o 0.5 1 1s 2 as :1 3.5 1093(054101 " ' ‘ ' > E 1 025 0.25 ' 0125 0.125 r, o o o coordinm 1 0.075 0.75 s s g E 0125 § E 05 9 § 0375 >- > 0.25 0.125 0 0 0.125 0.25 0.375 05 0625 075 0875 1 samooordha Xweenooovdhnh ‘ ', .. IIIhLKN 1. Vial 2 30.“ b1— Yaaomcoadm Vsaoancoovdkma .0 O .1 3 0.375 025 0.125 0 0 0 0.125 025 0.375 0.5 0.625 0.75 0.675 1 0 0,125 025 0.375 0.5 0.625 075 0.875 1 in: Xscroencoordlnahe Figure C.12: Task T2, subject 11, Fitts’ law time-distance plots and cursor paths for each button 235 Sal nnnsvsnismbomommbyfins'lnwwmeubLzom Summon. ammmubyFm'mmmw.sub1_ao12 20 2 ,5 Head-E eMoummraIO —-— 1g Head-E 11411011111.qu —-— 1s Hand-Eye- 011111.01Ns1-2 -~— 13 News» warm-151.2 + '7 - MouseJnalsO- ‘ ‘7 Mousenn-hO-z » ~~~ 16 * 16 15 15 14 14 13 . 13 12 12 1 1 10 10 s 9 . s 9 . .5 5 . g a . 1— 7 ' . . . F 7 6 ' 6 5 5 4 4 3 3 2 2 1 1 0 0 . -1 -0.5 0 0.5 1 1 5 2 2.5 3 3.5 -1 —0.5 0 0.5 1 1.5 2 2.5 3 3.5 man/5.1m" ' fiuhunmsan mun/5.1121" ' ' thr.-mun “ Ar, ,' cm 2012,1na10 u 11" ' unscramo 1 . 0.675 0.75 3 s g 0.525 g 1 °5 g g 0375 )v g > > 025 ,/ " 0.125 f y 1 0 ' 1 1 'r f 1 . 0 0.125 025 0.375 0.5 0.625 0.75 0.675 1 mm ” ‘ = ' nut-1.2012. 111-11 ‘- 0.375 Yscrmcootdhma Vsaeen mum 0 0| 0.25 0.125 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1 Xmooomb “ “ ', r In” 3012. I182 9 g E 9 9 ). > 1' . 1 ' “:\,_ ‘ \ o 1X- 0 0 0.125 0.25 0,375 0.5 0.625 0.75 0.575 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xmooordhnb Xwoon Figure C.13: Task T2, subject 12, Fitts’ law time-distance plots and cursor paths for each button 236 6d tun-Iv: dsmebunmbyFm'hwlmed.uq_2013 $2 Hoao—Exfle—Moomfllalo —~—- 3: Headi Mouth. malo ..— 15 Head-Eye- oummlals 1;,2 -+~ 18 Head-Eye- (1101,0131: 12 —0— 17 ' Mouse. mats 0-2 - 17 Mouse. mus 0-2 >- 16 16 15 _ 15 1‘ 14 g 13 13 12 12 - _ § 11 11 (,1 g 1g g 1g . / E 6 g B ’-///. 1— 7 r. 7 . /,/' 6 s ,, t/mr . 5 5 ‘ ,s/ u 4 4 , 3 3 /=" ‘ _ 2 ' 2 ' - . n . 1 r.’ ,‘.'v‘"-‘.-!'-"9‘~' J"- 1 £3"-"“"- "' 1' O 0 Sol amasva fimmmbyFm'hmeWJuwafl -1 ~05 0 05 WHO/$.12).Damsumebotwomsohalonpanu.submonwe mlfg72013. trial 0 t‘ , _ 00 7 1’1» ’ . b‘l — 0.875 5 '\ 1 - 075 “1‘ 1 s 5 0.625 r x U . 3 1,5 ,1 _ g 1 . . g 0375 . . > - ' s . 025 1' : "‘“‘ 1" ' . 0125 » _ *, .{rv‘ ,1, 0 ~ .4 ;_.{r‘ O 0.125 0.25 0 375 0.5 0 625 0.75 0,575 1 X coo inalo mar-13013. 01111 1 Y moon coordinalo 0 0125 025 0.375 0.5 0.625 0.75 0.875 1 womeoordlnala , mhL2013. trial 2 Y screen 0001mm 0 0 0.125 0.25 0.375 X 1 15 2 2.5 3 35 1 0 5 1 1 5 2 2.5 3 3,5 ' ' Q 1- mm an II'N 3013.011] 0 O 1' ‘— 1' A A ’ ‘ 1 0 0 125 025 0.375 0 5 0 625 0.75 0 875 1 00010“ 514' . 2 \J o \ 0 0125 025 0.375 0.5 0 62 0.75 X ”010.131. , 0 875 1 In" 3013. 01-12 0 > 0 0125 0.25 0.375 0.5 0.625 0.75 0.875 1 promooovolnaln Figure 014: Task T2, subject 13, Fitts’ law time-distance plots and cursor paths for each button 237 SamudsmmmlsbyFm'hmeWJmes ~ IWVW wu. WINS 20 2 19 HeaGE Mouth, 1:1an —-— .2 Head-EnerMotm. 111310 —-—- 1B Head—Eye- . mats 1-2 ~ 0— 18 Head—Eye- oum. mm: 1-2 + 17 “ammo-2 7+ 17 mv1-1302 5» 16 16 15 15 u 1‘ 14 g 13 13 12 _ ' E12 11 11 ’- 10 1o /// c 5 ,7- — 9 . 9 _ ,M g a g e wf/fi . 1: 7 . 1- 7 12’” ' 6 ‘ 6 x” - ' . . 5 ' . 7 — A5" 5 .. .. .4“ “ - - - ' ‘ 1'", — 3‘ 3 f - c 4 7,41” . a- 3 _. . r , 1. .-' 11» e. . 3 ..x’” . 2 , ' . 2 ,,/’ - 1 _ /""' - . oi Mm 0‘- ‘" '9 ’ 1 W/‘ ‘ , ', a -- v 1. ’35-'31.“ :7 -~"I o i o -1 -0 5 0 0 5 1 1.5 2 2.5 3 3.5 -1 ~05 0 0.5 1 1 5 2 2.5 3 3.5 lemma/5.112)." fikhmlnn 117. 1091.2 (95.1») "' ‘ :hhuumm- “ “ F, ,. III",_2015,V1&|0 ” “ " , III“ 3015.111810 1 _ 1 1 w 1 7 v M .' ‘ 3?. \ IV ("V _ ' 1:2 — ~ 0375 ‘52 ~ Y D3. / g __L. o 75 1’ > ' 1:75. ~ g / b8 .. E o 625 ’ ' 0.5 -5 , . § ""~—-.:,‘\_._‘ 1 ‘ g 0.375 {C \1 ‘ > 1‘ f . 1 025 g 1‘ V ‘ 1.. 1 0.125 . r" , 0.125 r‘f‘q-fi . V= 1 .-d¢1 1 1 1 .1 o A 1 ' - . n O A .1 l O 0.125 0. 0 1 25 0.375 0.5 0.625 0.75 0.875 1 womenomhnb , ' mm 2015, m1 I111? zzzazzzzz 1 O - O 0.125 0.25 0.375 0.5 0.625 0 75 0.875 1 X semen 000111111013 Figure C.15: Task T2, subject 15, Fitts’ law time-distance plots and cursor paths for each button V5610” 00016.“ Vaaaan ooovdhan 238 0.125 0.25 0.375 05 0.625 0.75 01575 Xianen In 3"" 3315.111.” 130., D1— :2 0 O 0.125 0.25 0.375 015 0.625 0475 0.875 1 IO ', r , l-"N_w15. Irlll 2 0.375 0.25 0.125 ‘1 18123882815281 . . . . 1‘ i 0 0 01125 025 0.375 0.5 0.625 0.75 0.875 1 Xmencoonfinlle Sol Mauve mmmumwms'hwwmwmmums 501.va mmmmwr‘m'mwuw.m_301e $3 Head-E euouth. «1.110 _._ $3 Head-E 110001.111an ..— 1a Head-Eye- cum. lr1als1-2 ..._ ,3 Hoad-Eyo— 01.015012 ..2 17 Mouse mats 0—2 +77 17 Monomer} ..- 18 16 15 15 14 . ' 14 13 13 12 12 1 11 n 5 1g ; ‘g . E 5 - E 0 E 7 _ _ 1: 7 . . . 5 ' - ”,— «”’1 6 5 - K/y—A’ 5 ‘ ' - = ,/-.”' t 3 —/ ’ 3 2 - ' ""° 1 ' : . 2 1 / ..- f? ,:.I‘-’| 5- 1 0 .- V . - n' o -1 0 0‘5 1 15 2 25 3 3.5 -1 -05 0 05 1 15 2 2.5 3 3.5 1093 (as .1121" Q 1- mmm m- Iog_2 (D's 91/2)_"' ‘ Q - mum on. F "" " ‘IIPL2016.MIIO “‘ ‘ 1 b0 7 fl 1 bl — 0875 D2 —— 0875 133 0 75 '- E ;_ 0.75 s , s I «.1.‘ b7 . . s 0625 be ,_ -. g 0.525 1 5 § 05 5 05 g 0.375 g 0375 > >- 025 , 0.25 . , . _ y, 0125 jLE/J :— 0.125 F“. _ l \ ‘ ‘ : ._ /. 1 v 7 \ 0 4L 1 1 .L 0 ‘ 1 - ‘1‘»4 0 0.125 025 0.375 0.5 0.625 075 0.875 1 O 0.125 0.25 0.375 05 0.625 0.75 0.875 1 Xscrun Io '- 1 0.075 075 S g 0.825 E 1 °5 1 g 0375 § > > 025 0 125 o '.,.’ '1 \- ILA 1 ._1 \F 1 0 0.125 0.25 0,375 0.5 0.825 0.75 0.075 1 000 h ', , ' mm 2016. m1 2 Yuan ooo'dhall Vacmn ooovdhmo 0 ' K] O 0.125 0.25 0.375 05 0.625 075 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1 Xwomcoormu meooomnu Figure C.16: Task T2, subject 16, Fitts’ law timecdistance plots and cursor paths for each button 239 Sol nmasvs fiwmmwfim'hwwmw.mm17 Sol maGmmobmomumnbyFm’hwfo'MmeJOV 2 13 Hmenwm.m10 —-— $3 Hem-E o-Moum.1mm _._ 18 Haaafyo- cummajs 1-2 ,‘W 18 Heed-Eyo- 01.01.1112 + 17 MouseJnalsO-2 -» ‘7 MoounaIIO-Z - - 1e 16 15 15 14 14 1:1 13 g 12 g 12 ' 11 11 c 10 g 10 ///‘ - 9 _ - g /’ a g a . ’ g 7 1: 7 / . 6 a // . 5 5 I l. ,/" I 4 4 /,// . . 3 3 /,.— . : 2 2 /« ' . _ ‘ 1. I . -' . ’.°.- -1.-~d _a. .‘t. 0 0 2 -1 05 o 0.5 1 1.5 2 2.5 3 :15 1093 (05.1.91 .4 ' shy-mun u .E .. ...:.. 2017.va u .z Anti 301701.10 1 .. . . no 7 -. 1 .. 0° 5 0075 :2 0075 . 'A :2 W . 1 , _ 03 . ,’ , :2 ‘ 075 g ...__' 075 - 1:5 ___ s g; s 3; ~ 0625 be 0525 her .____ w 2. _ , i 0.5 i 0.5 g 0.375 g 0375 >- > 0.25 . ._ .\ ’ 1 \ 0125 ————— . ‘~ 55:254. ; . _ __:b 1 . ; - 1 0 '1 o ' ' L 4“ 1 0 0.125 0.25 0.375 0.5 0.525 0.75 0.075 1 Xsaoen nominal. .. . .. . '. r 5 ...u,_3017.m11 1 1 0075 0.075 075 0.75 g 0525 g 0625 § 0.5 § 0.5 9 0375 3 0:175 > V‘ > ' ,1 025 f1: L1~J 0,125 0 l “I 0 0125 0.25 0.375 0.5 0.525 0.75 0.075 1 Ysaoancoamb V ween machete 0 0.125 025 0.375 0.5 0625 075 Xwomeoovdham Figure C.17: Task T2, subject 17, Fitts’ law time~distance plots and cursor paths for each button 240 X mooordlnab mm 3017, M2 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.375 1 Xmeoomh $01 limes vs dlslanos between buttons by Fms' law lor 5x8 0110.sub1_201a denuva mmmmbynm'hwmaxememama $3 HeactE Mouth. 171310 $3 Head-E 110001.01an 4— 18 Head-Eye outh.ma151-2 ~o- 18 Hm£y9~ 001111110312 —0— ‘7 Mouse. 171315 0-2 - 17 Mouse,mnls0-2 < ,5. 16 16 15 . 15 14 14 13 13 g 12 ,/ E 12 11 1’ 11 a 10 ,/// 10 E C _ 9 - 9 g a - g a . 1- 7 4 -a _ _ I 1: 7 - 6 /./’ - 5 5 9// ' 5 4 -../’ . - ' . 4 - 3 - . . ./ ' . - 3 2 . .' 2 . ‘ .- ,/ “:- . r,‘- . h 11'-'." 5 1 . _,—’_ 0 ' ' 0 ’ ' -1 415 05 1 15 2 25 3 35 -1 ~05 0.5 1 1.5 2 2.5 3 3.5 Iog_2(Dl$+1Q)," fiishunmmn log_2(DISo1f21" fishmmnurn u a 5 21111130150120 “ 4", , cuhl 3015.01.10 1 _ 1 0.575 1. 01-175 075 f 5 0.75 1 I 5 ' i‘ .. 5 0.525 ‘1 2+“ .- 0.525 E 11' 1f? 1' ' .5” 0 5 " ” 0.5 § 1110 1 ’ " i 51 0375 _.,.+,1f-,:__ .. 3 0375 >_ ‘1. 1 1 \ > 0.25 ‘ 5‘ 1‘ ~ 025 u [f 7 0125 I——“—; ---r—E 0.125 0 i 'I 1 1 L ‘ 0 0 0 125 0.25 0.375 0.5 0.625 0 75 0.875 1 0 Xmencoommlo “ “ E ,. I-IN_2O15. 111111 1 “ ‘ ', ‘ IIIhLflM. 1nd 1 1 7 1 00 , _ 0875 0.675 b2 ~—* 3 ...- 075 05 _. 0.75 05 __ g 1:16 rrrrr 3 1:16 ~ .5 0.525 E 0.525 ”7 1: 5 DB " g E ~ § 0.5 § 05 g 0,375 g 0375 , > > g ‘ 0.25 0.25 . 1 0 125 0 125 L 0 ‘ “'5 o ' 0 0.125 0.25 0 375 0.5 0625 075 0.875 1 0 0.125 0.25 0375 0 5 0625 0 75 0.375 1 50701371000111qu Xsaoeneoordtnaie “ ', ,. V "IN 3010. Mil 2 w _ .‘ D1 — b2 33 . b4 g ___- S 5 . b7 - z a 11 7 1 1 § . . 5 a g 0375 . g #551 >‘ 1 >' ‘. ~. 025 _ 0 125 Figure C.18: Task T2, subject 18, Fitts’ law time-distance plots and cursor paths for each button screen ooordnate 241 0 0 0125 0.25 0.375 0.5 0.625 0.75 0.675 50001500310108“ 501 times vs finance between humans by Fins law 107 8x8 and.su01_2020 19 Head-E 9540011111310 ——-— ‘5 Head-Eye— oum. mals 1-2 —0— Mous - Time In aacontk o-nlvu5mmuuto -0 5 0 5 l 1 5 2 2.5 3 3 5 Iog_2 (DIS +1/2)." Q s hutmn an .4 . z «1121,2020. m1 0 I517 , 51:11 Y screen coovdlnala V screen oootdinala o \ ' ~ , _ 0 0 125 0 25 0 375 0.5 0 625 0 75 01875 I can DOOR,“ V semen coordinate 0 0 0125 025 01375 05 0.625 0.75 0.875 1 501001100010qu Figure C.19: Task T2, subject 20, Fitts’ law time-distance plots and cursor paths for each button 242 Appendix D Task T3 Complete Results Table D.1 gives the statistics of the button selection accuracy and the average squared error from the linear regression line of the data plotted by Fitts’ law. The first ten columns give the attempt. number when the correct selection was made. The general trend from the first to the second session was that the selections were made mainly in the first two attempts and that the number of more-than-ten attempts is lower. Also, in the second session, most subjects were able to complete most draggings in one or two steps, unlike the first session, where the number of steps varied greatly. Table D2 shows the selection time statistics. The first ten columns for each session give the average total time needed to make a correct selection in the first, second,...,tenth attempt. The average time generally increases with the number of attempts, which is expected, since the user needs to make more than one selection. Also shown in the Table are total elapsed times for each trial, and for all the three trials. As the trials progress, the elapsed times didn’t decrease. However, when com— pared across the sessions, both the individual trial times and the total time decreased 243 significantly. This means that the users were able to make faster selections in the second session. Figures D.1 through D.19 give the selection times vs. button distance by Fitts’ law and the. cursor paths for each two dragging locations, for all 18 subjects and the author. 244 Across subject avg. number of steps: min 35, max 131, avg 84 Selection Number of correctness dragging steps REGLIN E1 Subj Sess 12 3 45678910 12345678910 EOE1—2 M 00 1 2514 0 200000 3 7 42200000 0 526139 - 2 5013 2120010 3 22 52100000 0 238 25514 01 1 4515 3 010221717103000000 0 480134 23 2 4393 2132330 2 23 61000000 0 353 249 23 02 1 116112021021 3140000101 69 - 53 2 3021212100119 9 421012001294 23148 03 1 3256 4 44224111 6 32211110 3 64 13100 2 5586 81053302 6 5104021030 5176166 43 04 1 3380 9 34034247 4 66312400 4 69 33 39 2 3836 2 2234041017 61220100143310918 05 1 25413 32161218 5 264000012101 100136 2 4444 4 2542221910 545022011138124 37 06 1 4136 4 255311251166220010 2 7810511 2 4834 4112401413 84120200 0 322 248 29 07 1 3454 5 76361041 4 375242201 13 2718 2 3798 8 32542624 4 91422012 5 85 9612 08 1 4752 4 434234 8 9 833401011218 11714 2 51201313001418 821000001251250 2 09 1 4633 5 4320121413 35413100 0 22111817 2 4930 2131214 717 62210000 2189131 16 10 1 3429 0 73512260 9 23671000 2 5 18 40 2 4812 5 0201013 20 82000000 0107 58 54 11 1 4882 6 4062231810 843001013 225 4211 2 56171210100117 92200000 0 358193 7 12 1 3737 7 5056134314 52131010 3 18 4018 2 5623 0100000 2 2910000000 0 495 494 6 13 1 643 5 35251235 0 00000110 4 23 -156 2 3810 41323211616 94100000 0 211 121140 15 1 5515 0 002000 2 24 40100010 0 619 248 19 2 58 8 1 000000 3 26 30000010 0 212 106391 16 1 3166 6 63174540 9 333112214 38 3418 2 30412 20124347 4 954331010 56 19 21 17 1 39416 66274235 9 46410110 4 70 84 37 2 3816 2 025101112143200000 0146117 - 18 1 5050 7 961122 9 9 65311201212613015 2 525311354121613 61224001114517511 20 2 5714 3 0010010 2143200000 0 476 283 32 Avg 1 3542 4 43231224 9 44311110 2165 86 43 2 4636 3 2222121016 622110001234175 51 1 2 Across subject avg. number of steps: min 31, max 139, avg 67 T - Average squared error of the regression line by Fitts’ law Table D.1: Task T3: button selection accuracy statistics 245 Avg. time for Trial Total selection (sec) time (sec) time Subj Sess 1 2 3 4 5 6 7 8 9 10 T0 T1 T2 (sec) 00 1 5.4 4.9 0.0 15.1 0.0 0.0 0.0 0.0 0.0 23.7 110 100 91 303 2 4.3 2.8 4.3 3.8 5.7 0.0 0.0 2.9 0.0 7.9 88 126 88 303 01 1 3.5 4.6 4.4 0.0 8.9 0.0 4.4 8.6 4.0 5.8 105 101 111 318 2 4.2 4.0 4.8 4.5 15.5 6.9 8.6 4.4 0.0 6.6 102 120 120 344 02 1 1.7 1.8 7.5 7.4 4.2 0.0 7.2 4.3 0.0 11.9 321 - - 321 2 7.9 4.7 11.3 12.7 6.3 11.5 0.0 0.0 5.4 18.5 290 462 - 753 03 1 3.3 4.0 6.1 10.8 7.9 11.0 32.8 8.9 14.5 6.6 314 323 - 637 2 3.8 4.6 4.8 15.7 9.7 5.1 6.7 0.0 5.6 7.9 295 510 133 939 04 1 3.5 1.7 5.6 14.4 4.2 0.0 7.3 3.5 4.5 9.4 183 480 181 845 2 4.8 2.9 9.0 12.9 5.6 8,7 5.2 0.0 8.1 9.7 133 234 148 515 05 1 3.9 1.9 11.4 7.8 5.0 13.1 7.6 34.8 4.6 16.0 355 281 - 636 2 5.3 4.3 13.9 10.1 13.8 23.7 8.7 4.4 11.3 15.9 439 417 156 1014 06 1 3.4 4.4 3.6 5.5 5.9 5.4 9.2 72.4 13.5 12.8 467 156 185 809 2 3.6 4.9 12.9 5.5 3.7 22.9 13.7 0.0 3.9 12.9 125 171 262 558 07 1 3.9 2.7 4.6 3.1 4.4 5.6 6.0 3.5 0.0 8.8 331 218 218 769 2 4.7 4.2 9.5 15.2 13.4 4.4 12.3 10.7 13.5 29.5 885 336 391 1613 08 1 3.6 3.0 10.2 10.5 4.5 5.0 8.5 3.7 5.1 5.3 250 162 119 533 2 3.9 3.7 5.3 5.0 5.4 4.7 0.0 0.0 7.3 9.5 142 107 106 356 09 1 3.3 3.9 5.1 2.7 2.9 19.3 0.0 1.1 15.4 25.7 343 136 275 755 2 4.1 4.3 4.9 8.7 7.9 12.8 3.2 3.9 8.6 10.0 172 230 98 501 10 1 2.6 2.7 0.0 4.6 3.8 2.8 8.4 4.0 4.9 9.3 297 178 332 807 2 3.8 3.1 8.0 0.0 7.2 0.0 3.7 0.0 3.3 8.9 113 91 103 308 11 1 3.6 1.1 6.2 10.8 0.0 6.9 4.0 5.6 8.9 8.0 190 299 87 577 2 3.6 3.4 2.2 3.4 1.7 0.0 4.2 0.0 0.0 19.1 79 127 86 293 12 1 5.1 2.3 5.4 4.1 0.0 7.5 6.1 3.7 8.8 10.0 421 252 190 864 2 5.7 2.5 0.0 5.0 0.0 0.0 0.0 0.0 0.0 6.4 105 140 98 344 13 1 1.1 4.4 16.6 16.6 16.4 5.7 18.5 1.9 5.5 27.0 1470 - - 1470 2 4.4 6.2 11.4 4.0 8.9 12.7 11.6 10.6 16.7 8.9 184 182 178 545 15 1 4.3 3.2 0.0 0.0 0.0 5.9 0.0 0.0 0.0 7.0 108 117 84 309 2 4.7 3.1 19.8 0.0 0.0 0.0 0.0 0.0 0.0 6.0 108 127 104 340 16 1 3.0 2.6 6.4 5.1 4.8 5.4 6.4 4.4 5.2 8.4 247 249 277 774 2 3.5 3.1 11.9 8.4 0.0 4.9 5.1 5.0 5.9 13.7 512 206 253 971 17 1 4.9 3.8 10.1 5.2 8.0 43.2 16.0 13.7 8.1 12.7 537 458 201 1198 2 6.6 8.1 7.2 0.0 2.0 6.1 12.7 0.0 6.4 18.1 252 225 167 645 18 1 4.6 3.7 9.5 7.5 5.9 5.1 12.1 7.1 7.6 4.8 160 287 224 673 2 4.3 3.2 7.4 12.5 6.0 4.6 4.0 27.6 8.5 7.5 151 324 197 673 20 - 3.6 4.5 5.6 0.0 0.0 6.0 0.0 0.0 6.2 0.0 102 98 95 295 Avg 1 3.6 3.1 6.3 7.3 4.8 7.9 8.6 10.1 6.1 11.8 345 237 184 700 2 4.6 4.1 8.3 7.1 6.3 7.1 5.5 3.9 5.8 12.1 232 230 158 612 246 Table D2: Task T3: button selection timing statistics Satin-5v: asumbawmmubyfim'hwfulOO-100pixdm.m2000 Selim.“ “ 20 19 Head~E ou 0+ Ch. trial HMO—E . mm 1-2 o-uvubmouam ‘1 0.5 l ~O.5 O 109_2 (DIS + 1/2). " .. . 1 r . 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 I X scrum coordmu Figure D.1: Task T3, subject 00, Fitts’ law time-distance plots and cursor each target Time in sooonth ercanooordhma 247 .. o—Nuhwmwawoanu - 2.5 3 3.5 Q In human ub- -O.5 0 0.5 1 100.2 (W5 + ”9‘ ‘ -1 1.5 2 men ooordhalo , . 1U. o . o 0.125 025 0.375 0.5 0.625 0.75 0.375 1 a paths for Sol hm vs “- tum' law 10! mhl_2001 Sol limes vs nu. law lot 0"me $3 Head-E swam. 111310 + $3 Head-E Mouth.va —-—- 18 Head-Eye~ cumulus 1-2 + 13 Head‘Eyo- mammals 1-2 —+~ ‘7 17 Mouse, trials 0—2 - 1e ' 16 15 15 14 14 13 13 12 g 12 11 § 11 E ‘0 2 ‘0 n - 9 - 9 . E 8 E a i: 7 F 7 6 6 5 5 I 4 3 u 3 2 - . 2 _ 1 1 - 0 0 ' -1 >0 5 0 0.5 1 1 5 2 2.5 3 3.5 -1 -0.5 0.5 1 1.5 2 2.5 3 3.5 lou_2(D/S+1/2).Dbaswloebememwacflonpoms.53bmlonyze Iou_2(D/S+1l2)."' Gunman “ 4‘ _ r ‘ mhi 2001.!rla10 ” ‘E, ‘ ‘, mhLmLtfialO 1 1 E1) .. .. o 875 ‘ 7 n2:— 0 75 3 S 3 o 625 .2 ° E i °5 § § 0 375 g > > 0.25 0125 ..... . .\ .. \. ‘ 2'; i‘x o x: A o ' a ' 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 screen ooord'nalo X screen ooordhlln ” “ " mhLZOOI, Ina|1 “' " ' , 'IIhLNOLtflIJ 1 Y sateen ooordhm Y screen Mahala ‘;4 ‘ V screen coordinate t. , ’* q . . . i 7‘ . . . 0 > a; A . ~..“ u__ 0 0.125 025 0.375 0.5 0.625 0.75 0.875 1 0.75 0.875 1 X screen coordhaia X men mural. u A r , u-N,_2001.ma12 0 1 ¥ ‘ 5 i ' ' - ‘ i ' » 0 0 125 0.25 0.375 0.5 O 625 0 75 0.875 1 coordinate V screen coordinate / 1 . '- 1 0 4 ' ’ ‘ O 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1 W "19 Figure D.2: Task T3, subject 01, Fitts’ law time-distance plots and cursor paths for each target 248 Sal nmesvs , Fuu' law 101 subL2002 Sol amasvs ‘ 7‘...‘ law 10! 51113002 20 $3 Head-Eeroum,mal0 - 19 Head-E sMoutthlO —-— IB ‘3 ~ Head‘Eye' oum,mals1-1 —v— ‘7 17 MoweJnalsO-l ~- 16 16 , 15 15 14 14 . - '0 13 13 ‘3 12 § ,2 . . § 11 g 11 1. m 10 1o ' , . ~ E 9 E 9 ' E a E e t: 7 F 7 6 - . , e 5 - _- 5 4 - - . 4 3 3 2 ' 2 1 1 0 “ ‘ 0 -1 -05 0 0.5 1 1.5 2 2.5 3 3.5 -1 Iog‘2([yS+1/7) "' ‘ Qkhminn nun 0.875 075 E E 0625 E 5 05 g g 0375 > > . ‘ _ g , :‘y .l 0 /. . ~ - a 0 , —— ‘ .. . " - O 0125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0125 0.25 0.375 0.5 0625 0.75 0.575 1 X m coovdhaln X screen main-lo u . g, ...n,_3002.m11 Y screen coovdlnala . 4. 3. 31%" ' 5 ‘0 0.87 not completed not completed not completed Figure D.3: Task T3, subject 02, Fitts’ law time-distance plots and cursor paths for each target 249 Sol omens ' ‘ ‘ 1"“;th mhi 2003 Sal timesvs ‘ ”...annor II“ 3003 HeadE Mounmalo + Hoadiyo— cummals 1-1 —.— Head-E M00111. malO —-— . Head-Eys— 1h. Ina]: 1-2 ——o—- Mouse. Mano-2 .. 4......_._._._.~ NubIIIOiVGHDO Tune In 5m Time In second o—wao-OVIHDO -1 Y moon coordinate V wean ooordlmlo' V ween madman Y screen 000'th o ' A ‘ V t 0 0125 025 0.375 0.5 0.625 0.75 0.875 1 O 0.125 025 0.375 0.5 0.825 0.75 0.875 1 Xmooovmn X .170me V screen mum 0 . \' 1 g 0 0.125 0.25 0.375 0.5 0.525 0.75 0.875 1 X screen ooon‘flmb not completed Figure D.4: Task T3, subject 03, Fitts’ law time-distance plots and cursor paths for each target 250 Setumasv: ...u'hwml “112004 Saturn-vs" ...uwm: whim $3 Head-E eMoumJnalO —~— 33 Head-E MamMalO —-— 1e Hold-5w .uais 1-2 -~--—— 13 Head~Eyo- oulh. 1112131-2 _-_ 17 17 MousoJrhbo- ~ 16 16 15 15 14 14 13 - 13 12 12 ' 11 , ' . 11 5 1g 5 13 : . , g a - . . = g B - 1: 7 -. ' ' I ' ’- 7 6 G 5 5 4 4 3 3 2 2 1 1 0' 0 »1 -05 0 05 1 15 2 25 3 35 -1 05 0 05 m21D/SM/2).Dusdlsvamaoetmmmpomu.submonun mason/2).“ “ ", r “2004111310 0.375 _ V saloon commute V screen ooordhalo 0 (II 025 0125 “ 4= ---~3oo4.u1u1 V 301531100010th V sot-01100016111.” [a .17“- O .. 1 - u . 0 0.125 0.25 0.375 0.5 0&5 0.75 0.075 1 I. Y screen 0001de V screen ooovdham 0 2‘11 .; »—-v-’ L ‘\ ,1 0 >1: 1 ‘l l 0 0125 025 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 X semen comma X ween ooo'dh-tn Figure D.5: Task T3, subject 04, Fitts’ law time-distance plots and cursor paths for each target 251 Selma “..., 13117101 511112005 Sumn‘ ‘ mums» fig EmerlalO + 7: , Head—E 011111111510 —o- ,3 Head—Eye— m.1r1a151.1 ~k 1a - Head-Eye- 11111131512 ..— 17 17 ' Mousammoau --- 16 . 16 ' 15 15 14 14 I ‘ ' . - 13 13 ' g 12 ' g 12 § 11 ' g 11 a 1 . 2 - . '3 3 8 g a 1: 7 1: 7 6 6 5 5 4 I 3 3 2 2 1 1 0 0 Y ween ooovdtnala 0 0 0125 0.25 0375 0.5 0.625 0.75 0075 1 Xsctoonooordhlm INN 2W5. 111-1 1 waon coordinate X saloon coordinate not completed Figure D.6: Task T3, subject 05, Fitts’ law time-distance plots and cursor paths for each target -1 -05 100.2(D’5+12). "" 05 1 Y ween 00““! “1. screen ooovdhnb V screen MM!” “\_ '3. Q 0 :1, 'égs's ' . .1 ‘11 ‘ d. . ." , 0,125 513%: 1?." X lemon ooovdh-tl Y some" 000111111115 ‘" J ‘ : .1 » i -l 0 1 1 1 ‘ 1/ ‘ 1 O 0.125 0.25 0375 0.5 0.625 0 75 0.075 1 x ”M m” 252 .1211 3005 Scltimcavs“ “‘“Iawbr 1"th Seln‘meeve $3 Head-E mmmlo ‘-— 38 1e “950's” "‘ 1a 17 17 16 16 15 15 14 14 13 g 13 g 12 , ' _ E 12 11 11 n 10 3 1O 5 9 '. 5 9 g a - g a 1: 7 ' ' 7 6 6 5 5 4 4 3 3 2 2 1 1 o o Ylaem ooordhlle V sateen coordhale . gégiggfié «I. §§Hjl3 ‘gfiw ...h vs mg? V screen coordinale \1‘~ 4,.- _. 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.375 ”m b Y screen madman Yacreen coordlnata V screen coordinate . 'mu' law 101 It“ 3005 How-E 011111.1qu o —-— ' Head~Ey9 1h. Irlals 1-2 —«— Mouse. 111319 02 ,., , -1 X moon ooorolnam screen coordinate Figure D.7: Task T3, subject 06, Fitts’ law time-distance plots and cursor paths for each target 253 501 lines“ ‘ ”a: law 101 51112007 Sol Imesvs ‘ " 'Iawvor .mgvaoo'y f3 Head-E e—MouthnalO —-— $3 . Head-E ammo "—~— 13 HmEye— 00111. In‘aIs1-2 . ~-— ‘5 . Head-Eye— "1.141515%? ”...,. ‘7 Mousatrlals 0-2 , +7 ‘7 _ 15015011131302 7"— 16 1s _ - ' . . 15 15 o ' 14 - 14 . - 13 a _ 13 . ' 12 “ 1 1 . I 11 c 10 - .. 2 10 - 9 u - 9 g a : ' . E 8 1: 7 ' 1: 7 s _ 6 5 5 4 4 3 3 2 2 1 1 1 0 0 -1 -0.5 0 0 5 1 1.5 2 2.5 3 3.5 loq_2(D/S¢1l2)."' ' fikhmmnnln '1 1. ’3 . V screen mordinata V screen coordlnale Vscreen coordinate I . 1"; . . 1 . . J o 0.125 0.25 0375 0.5 0.625 0.7 Xsoreenooonflmb , 0 '. 1 5i . 5 0.875 1 0 0.125 0.25 0.375 0.5 0.825 0.75 0.075 1 X mm m Figure D.8: Task T3, subject 07, Fitts’ law time-distance plots and cursor paths for each target 254 Seton-05v; ‘Tu'lawbr Inf-2006 Sdfltneavs‘ “'hwbr "“ ' Inf-13000 fig HemEmMoum.mal0 —-— $3 HeadvE 01m. 111an —-— 15 Hewae- 111.0121: 1-2 A+~ 18 Heed-E othnalH-Z + 17 Houseman 0—2 ‘-- '7 MouseJrlals 0-2 . 16 . 16 15 15 ' 14 14 13 13 12 12 _ 11 11 c 1g I - . c 1g 2 a . r a - 1- 7 _ - ' 1— 7 . . _ - 6 6 5 5 - 4 4 3 3 ' 2 2 ' - 1 1 -1 -os 11 0.5 Y screen coordinale V screen ooordhate Y screen coordtnate O 0 0 Lei—4.12.2 0.125 025 0.375 0.5 0.825 0.75 0.875 X screen coordinate Y screen coordhate V screen coordhale ' 4' {if L4 1’ . l . . 1 0 0.125 0.25 0.375 0.5 0.625 0.7 0.075 X screen coordhme 1 10L: (01s. 12). "‘ 0 4 x 5 1’ I _, 0 0.125 0.25 0.375 0.5 0.65 0.75 0.875 1 X screen coordinate o i” ' 0 0.125 025 0.375 0.5 0.625 0.75 X screen coordina- 0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 choomla Figure D.9: Task T3, subject 08, Fitts’ law time-distance plots and cursor paths for each target 255 501. times 1!! .u' Ilwror ' am 2009 5‘. rim-sv- m- Ilr rur I!“ m 19 HOME e—Mmm,1n’a|0 '0— 19 WEW.WO —-— 15 Heed-Eye ourh.1r1als1-2 mam 18 Head-Eye- .lnalsI-Z —-— ‘7 Mouse.ma150-2 r ‘7 9.111413 77H 16 16 15 15 ‘ 14 ' ' - 14 1a - 13 , E 12 5 12 ‘ 1 '3 .. 1 w 10 ’ ' ' - - 1o ' - ' ' E 9 . . E 9 . ' g a ' 1 E a 1- 7 1: 7 6 6 5 5 4 4 3 3 2 2 1 1 o o- b or o a: u- re N or U u 01 o 01 ‘ .. or 2 2.5 3 35 Y screen coordinate Y scram coordhere 0 0.125 025 0.375 0.5 0.825 0.75 0.875 1 X saeen ooordhare Ysaeencoordrnare Y screen ooordhale '\ i 1 \’ 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 ooordhale Xscroencoordnals Figure D.10: Task T3, subject 09, Fitts’ law time-distance plots and cursor paths for each target 256 Sd lame-vs ‘ 'u'hwlov wh- 2010 Sd nnnsvl ... awn ...n 3010 19 Head-E e-Moum.u1a10 + g WE .mo _._ ‘3 Head-Sp cummals1-2 + 18 Hun-E .l'llh‘I-Z #9 17 Mome.malsO—2 “—77 ‘7 MOI-OJIUIM >- 16 16 15 15 14 . 14 . g 13 13 12 12 § 11 g 11 _ 2 1° - . 1° . ‘ 9 ‘ 9 E 8 E a - . 1: 7 1: 7 - - - 6 6 5 5 . 4 d 3 3 2 2 1 , . o . , . o I ' ~1 ~05 O 0.5 1 15 2 2.5 3 -1 -O.5 ' Iog_2(us,m)" thw- i‘rx Ax ’ 1' 1 .- 0 r, ‘ ‘ 1 0 0125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xlere-ncoovauh CI 0.125 025 1:» 0 O 0.125 0.25 0.375 05 0.625 0.75 0.875 1 Xscreeneoovu'lulo Y screen 0001511!” ‘ ~. ' \v—‘AL 0 0.125 0.25 0.375 0 5 0.575 1“” ”10. “ll 2 ‘1 _. g" 1/ , * , ‘~ ' x ‘ o x - , -. 1 1 0 0.125 0.25 0375 0.5 0625 075 0.875 1 Xmoncoord‘nale Figure D.11: Task T3, subject 10, Fitts’ law time-distance plots and cursor paths for each target 257 Sal limesvs 'u'lawlu mhi 2011 Sol lumen '1...‘ law 101 ' ---N__3o11 2 ,g HeawErAe-Mwmmialo —-— $8 Head-E 01m, mo —-— ‘5 Head-Eye- mammals 1 2 ‘O- ‘3 Head-Eye- .Irials 1-2 __.._ ‘7 Mouseuws‘H - ,7 Memo—2 ..., 16 ' 16 15 . 15 14 14 13 13 12 - 12 11 - 11 c 10 . z 10 - 9 _ - 9 a E e , - E a . " 1— 7 . . ' 1: 7 ' s s - ' . . 5 5 - " . ' - . . 1 4 4 - . _. ‘- 3 3 ' - 2 2 . “ ' '. . . 0 ‘1, 1‘ J 2 35" v 4' M “a?" -!-‘.'-1 ~1 .05 0 0.5 1 1.5 2 2.5 3 3.5 -1 —0.5 O 0.5 1 1.5 2 2.5 3 3.5 1on_2(DI$¢1r2),"‘ ' m2 (015.1191 M gamma. .1,- Y screen ooavdlnele Y ween contribute a 0 0 0125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xecmenooomte “ “ E, , r ~fl_3011.11‘1d 1 0.375 V ween manure Ysaeenoocrdime .0 a E 0.25 0.125 Vwewoootdhm Ysaeen mum - 2.1 ~ '1 0 0 0 0.125 025 0.375 0.5 0,625 0.75 0.875 1 o 0.125 0.25 0.375 0.5 0.825 0.75 0.875 1 choovdimw Io Figure D.12: Task T3, subject 11, Fitts’ law time-distance plots and cursor paths for each target 258 Sal flmvs disranaemeenbrmormby fins' Iawlor 1002110001er cons. 31013012 Selmmsvs. ‘ ..u'lawmr ' sur- 3012 20 20 19 HeaoE aMoum‘malo _._ 19 Head-E s-Mouth. 111an + 18 Head-Eye ounmals 1—2 , .7. ,5 Head-Eye- oulfmnals 112 H— 17 Mouse. mats 0—2 -» 17 Mouse. trials 0-2 -. 16 . 16 15 ' 15 H 14 13 13 12 . g 12 11 § 11 10 " 10 C C - 9 - 9 E a g a 1‘— 7 1: 7 S 6 5 5 4 4 3 3 2 2 . . 1 1 ’ ., )5.- ,n _|-.__'1 0 0 -1 43.5 0 0.5 1 1 5 2 2.5 3 3.5 -1 —0,5 0 0 5 1 1.5 2 2.5 3 3.5 Iog_2(D/S4 1r2). Dudisrance betweensebwon polnls.$sbu110n size Iog_2(D/S.1/2)," ‘ fihmsbn u A =, “ , um 2012.1nalo u A =, , ‘ u-n,_3012.mlo V screen ooordinale 0.125 0.25 0.375 0.5 0.625 0.75 0.875 Xsoreeneoordlrma ‘4 . ', umLaorz. 1111111 Y screen coordhale Y screen ooordlnete XWOU'I V screen coordinate V screen coordinate .- i- - ‘3 ' 1 “LI . 0 . O 0125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xmencoordhalo Xnoenooordhala Figure D.13: Task T3, subject 12, Fitts’ law time-distance plots and cursor paths for each target 259 Sel times vs ‘ , '11...‘ law ror snhi 2013 Sol limesvs ‘ ”5:: law for III-N M18 2 1g ' MEyeMmm. 1310 w is Head-E o-Momturialo —~—— 16 - Mouse,masO-O - 18 Head-Eye—Kfioxm. mals1«2 -—4-— ‘7 . _ . _ - 17 Mousemialso-Z - 16 I 16 15 . 15 - 14 _ ' . 14 13 . . . . 13 g 12 ' . ' - g 12 ' § 11 - ' - § 11 n t 10 0 1o . " ' E 9 S 9 . . ' a E a 2 e ' " - 1: 7 ' 1: 7 . - a ' s s _ . 5 5 4 4 ._ '. a 3 .. -a . . 2 2 - c I _ ' 1 l.. ,' 1 ‘ " ”3%.: P” "3'1"“ "'2" "' o o - -1 -05 0 0 5 1 15 2 2.5 3 3 5 A1 ~05 0 0.5 1 15 2 2 5 3 3.5 '01L2 (ll/5.101 "‘ ' thimm nun m2(D$412),“' ’ ’ thmhn s17“ u .. =, , mhi_2013.1n‘a1 a U...‘ =, , «mean. m1 0 V screen ooordhate 0 UI Y ween coordhale o a . V o - , , \ 4,71 0 0 125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0 75 0.875 1 X :01 W00 X screen 000113111510 u 4:, 'v gull 3013.01.11 0.875 7 0. 75 0.625 Y screen coordinate O 01 \ \1 \L I“ 0,125 ‘. ‘ \., 1 ',__ A ‘ ‘ ‘ \‘h '5 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 not completed x mmcoudmre " J', r RUN ”13.01012 0.375 V screen coordinate 0 0| 025 0.125 0 was. 1 L - . 0 0.125 0.25 0.375 0.5 0.825 075 0.87 1 not completed xwmm Figure D.14: Task T3, subject 13, Fitts’ law time-distance plots and cursor paths for each target 260 5d Irwin. ' ’17.. mum ' ' suhl 2015 591.me minor III" 3015 20 ‘9 Heed-E .mo —-— 19 Head-E Monmulalo _._ 1a Held-Eve .Iflals 1-2 A“ 111 Head-EyO-muthmtals 1a «— 17 Mouse. 111315 02 ~ 17 Mm. “51.0.2 .-.. ‘6 16 . 15 15 . 14 14 g 13 13 12 12 g 11 11 . 1o . s 10 9 . 9 g B . g a 7 In F 7 6 6 5 5 ‘ 4 3 a. . .I 3 2 r. ... . .' - ,, 2 . . . é :yifi’fi. .1! "I-"J‘flJM" ‘Wd ; xfl’ . .‘.' ~-~-.—r'« 'vrv“-v°""“ -1 —05 0 0.5 1 1.5 2 25 3 3.5 -1 '05 0 0.5 1 1,5 2 25 3 3.5 balm/$4112)!“ " ' thmlmmn “Lam/5.1m ‘ . cur-mun “ “. - , “11,2015. mo 1 F, - F ..u 3015.1111110 v screen ooordhab " 1 0 .. . . 0 0.125 025 0.375 0.5 0.625 0.75 0.075 In IIIhL2015. vial 1 V screen coordhele o ‘ .’ 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 Xmeneoordnu “ ", l1“ ”15.1fl'd2 0125 ”___. ‘ ._l 0 0 ’3 E Q 1 ‘2}. 0 0.125 0.25 0.375 0.5 0625 075 0.075 1 0 0.125 0.25 0375 0.5 0.625 0.75 0.075 1 in X screen ooordnna Figure D.15: Task T3, subject 15, Fitts’ law time-distance plots and cursor paths for each target 261 Sol limesvs " '11:; 13w ror sub-72016 5d urnesvs " ‘ ‘ "m‘ 1:qu IuhL3016 20 20 ‘9 H8305 94401101.“an ,9 Head-E Mouth. mo —~— 18 Haw-Eye oulh,mals1-2 4'7 16 Head-Eye- oulrmrials1-2 + ‘7 Mouse, Inals 02 - 17 Mouse. 111315 - 16 16 15 . 15 14 14 g 13 13 12 1 ‘ ' ' g 11 11 . . 10 ' ' 1O ' . E 9 . S 5 . - . E 8 . a . . . u E a . .a . . . ' I e ' . . --' I. e In 2 . . '.. _ i . _- ‘ ." : .- ”_- °¢-.' . 3 . ._ 3 - ' - '- ' ' .' - r . . A 2 ' . ,. 2 1 7 ,“ {1&T""T' 1‘1 . ' 2, . " ' 11‘3.1'1w= ":11 Y screen ooordlnele Y screen ooorumte Y screen coordlrme -1 -05 m2(D/S412)." 2 2.5 Q 1n hum-m 1:17- mhL2016. 111.1 0 35 025 0.375 0.5 0.825 0.75 X I! ..g:, 0.075 1 .0 u- 2.5 3 3.5 ' ‘ Q 1: team an: Y screen oocrdhale Y screen mm 1 ” . _ - ‘ 0 0.125 025 0.375 0.5 0.625 0.75 0.375 1 chfoonmomh Y screen ooordhale V - Pl 3. O 0.125 0.25 0. ' - r; (AVE:- 375 05 0.625 0.75 0.075 X lemon coordhlh Figure D.16: each target Task T3, subject 16, Fitts’ law time-distance plots and cursor paths for 262 Sal. arms vs. ‘ ‘ ‘ “ Fuu'hwfor ’ m“ 2017 $01. limes»: .... uwror m-N N17 $3 Head—E e—Mouthm'ialo —-— $3 Head-Emmanmo + 15 Heed-Eyezfloummxals1-2 ~~—- 18 Heme-Eye- 111.1111151- «- 17 MouseJnaisO—Z -' 17 . 16 16 15 15 ' 14 1‘ ' g 13 13 12 12 . g 11 11 1 1 n 1.: S 5 g g a g a - . _ . . 1- 7 1— 7 . . -' I 5 5 . ° ' 6;, . .../ 5 5 . . . rffw . 4 ‘ ' . ' ,/ . . 3 3 _ / e' 2 2 . 1 . 1 ' ' - O , , . . O . . .- . -1 -0 5 0 0 5 1 1.5 2 2 5 3 3.5 -1 —0.5 0 0.5 1 1.5 2 2.5 3 3.5 Iou_2(D/S+1D\" sum-tn u a', , , mu 3011mm V 5C1” mull.” V screen (”01071.18 0 u- 0.125 ' -\ / \\\\ ‘ 0 i/ 1 ’ x ‘I ‘ ' 5. 0 0.125 0.25 0.375 0.5 0.625 0.75 01075 1 Xscroenooordhme u . c, ...u_3o17,m 1 Ysaeen moron-1e Ysaeen coordlrule *4 ‘P, , mu 3017,1na12 150“" 1.——' Y screen ooordhate V screen coordhele . v : . \ . .\ . 1/1 3 \ . .1 °__ ‘ _. rs. . '_ '1" 0 .1 ah” 1' 0 0.125 0.25 0.375 0.5 0.625 0.75 0.375 1 0 0.125 025 0.375 0.5 0.625 0.75 0.875 1 Xeeroenooordhlb xucreenaoordheh Figure D.17: Task T3, subject 17, Fitts’ law time-distance plots and cursor paths for each target 263 Sci tunes vs ‘ ‘ “ammo: ' Inhi 2018 Sci time-vs “ .. 111141101 ' 1.1-11173018 $3 Head-5mm. maio —-— 13 Head~Ene—Mwm.malo _._ 13 Head-Eye- m.1nals1-2 —+—~ 18 Head-Eye- ourh.mais 1-2 ~o~ 17 Mouse. 111:1; 02 ~ ,7 Mouse. 17181802 - 16 16 15 15 ' 14 14 " 13 ' 13 g 12 - g 12 ' 11 ' 11 n 10 ' . a 10 E 9 . ' ' . 5 9 - g 5 I. . I g E n c r: 7 ' " ' . .. r: 7 6 6 5 5 4 t 3 3 2 2 1 1 0' 0 -1 -05 0 05 1 15 2 25 3 3.5 qu_2(WS41R),Disdswceberwemseieuronporms.Submarine "‘ ‘5, ‘ *, mh;_3013.irla10 2 s 2 g g s 3 51 > > o o '- " ‘ ’ 0 0 125 0 25 0.375 0.5 0 625 0.75 0.675 1 0 0 125 0,25 0,375 0 5 0.625 0 75 0 875 1 X screen coordinate X screen coordhale “ ‘ ", ‘ , . --H_201a.m1 “ ‘ '=, r r mm 3018. 1111111 Vscreenooordinate Yscreencoordlnue =32 I .I " o L“; ? 2 5‘1 o 0125 0.25 0.375 0.5 0525 075 0.875 1 XSC'W‘OOO'GN" ' , mm 301a. 1m: 2 Vacreencoordinale Vacreencoordhme .9 .0 n 01 a- 1 ‘ ‘ . §\ Na. ‘1 I' X T E o 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 X Iaoen coordinate X screen ooordhale Figure D.18: Task T3, subject 18, Fitts’ law time-distance plots and cursor paths for each target 264 Sei bmes vs distance between buttons by Fat: law to: 100x100 pixel icons. Sub: 2020 20 w . . Y 19 » Head-Eye-Mootntnalo _._ 13 , Head-Eye—Mouth, tnals 1-2 - - . 17 , MOuse.tnals 0—2 - . 16 . . . 15 t 4 14 > I ‘0 13 4 E 12 1 . § 11 . . m 10 t 4 C _ 9 I . 4 g 5 . . ; 1: 7 i j 6 u a . D.5 " ‘ 5 b e . E ..‘. ”/an 4 h t. . . 1.: FL'Fk'-3f‘_dr."' L . I 3 i c n . D s . “...-..—ehf’."r~'. 'a'I' . . a. 2 b I If ‘ ‘5” 0 . . e 4 1 ‘ M ’9’. .'M If¢"' I .\:IU~“‘I“ I, -l*1 o F: . I 5. U . _ 777— A; A A -1 -O S 0 0 5 1 1 5 2 2 5 3 3 5 109,2 (DIS + 1/2). D is distance between selection pants. 5 13 button size Head-Eye-Mouth pounter paths lrorn button to button. subj_2020. trial 0 DO . ..DJ __— 0875 > y . f-‘ < I; 02 ......_.. . . ,4 a‘- I l _'v 7 K ‘ 075 r I ‘ . ‘-. . :47”. ‘1 ”.05 .\ % .\. e , . a". 1’ If \' 4 ’.,J m / - '- .l ' u .. c » r~ J" a - ”sf 6 0625 , I-v-‘7.Iad3’f_-_'... -/»« .. bob .. ‘ g o u a 1/ .7 ‘- 1- . .. 09 . Q” o 05 “F “’3'” 1. "r r‘. - . N— - "l ,' 4 , 4 ‘\ I; § >— 'V .4. ,IV’ .1 . . \ .W ‘ If; . . ‘1‘ .\v . ‘ MI/~‘ I iv— —. ’ ‘K ' “‘. 8 0375» ..I‘ ._J . 1' I . k‘ ’....,. '3‘ > VI ‘\ I a: LI- ‘ 1 L -—O—*—- ‘qép‘fi‘n... e . .H’ \ N. _ 025 + {1“ ‘f .7.’ . ’4’. ‘T_. \ 'II / 4 KV~ 'l‘ f j ' \‘x‘ A..—.‘ :f‘ ’. ~ 0125» . I, . 1: i .g ‘ 4! A A A A ‘41; o A A 0 0.125 0.25 0375 05 0625 075 0875 t X screen coordinate Head-Eye-Mouth pornter paths irom button to button. subj _2020, trial 1 1 fl 1 fi .. i ‘ a _ . . no ‘ O OMOTI _.'_A_ b1 0875 1 ' V132 _;a.... 075 1 a E 0625 1 " ' 4 E ” ' g 0.5 " . 9, 0.375 4 > U 025 r 1 0125 . A 5* A " .2 5 ' ..L' I”). LI. .. _‘l‘ o A A O 0.125 0.25 0.375 0 5 0 625 0 75 0.875 1 x screen coordinate Head-EyeMouth pointer paths from button to button. subL2020. trial 2 1 T I, . . ._ . 1. '. rho z ‘11 ’1! '- ‘ :3 bi —- 0875» .1 > i . F 1* 1,2 .. 1 .1; ms 075 _ - IIIII . _______ Blade; .52 .-..__ ; 2 . §“ _,- I,' \‘I.’ ._ . , , .. E 0625 .. :*""..-'"~1’ 1.‘ i ”7 i ‘3 ‘ v ‘_‘.-" 1\ . ,' 33 V g 05 ‘ f 5‘? T—F” 3. - . § \ 1‘ “'1- '- g 0.375 ..-. 1.: > , v 1,11 ~ 025 / " "L .‘ < .. i; -|..“ I" 5" 0125 I}: u— - 1 1 :*~ \ 0.375 0.5 0625 0.75 0,875 1 Xscreencoordinate o A 0 0125 025 Figure D.19: Task T3, subject 20, Fitts’ law time-distance plots and cursor paths for each target 265 Appendix E Task T5 Complete Results Table 13.1 gives the sizes of the buttons used in the experiment. The sizes are given in pixel units, inches and visual angle. Table 13.2 gives these data: 0 number of erroneous selections (column 3), o the number of times the experimenter had to turn off the camera (column 4), o the number of times the experimenter had to type (column 5), 0 time needed to complete the task (column 6), 0 percentage of the task completed (column 7), and o the sub-tasks that were skipped (column 8). 266 Buttons :1: x y measures in: minimal identifications pixels inches degrees angle size Netscape browser 994 x 910 11.93 x 10.27 27.48 X 24.33 24.33 URL entry 590 x 25 7.08 x 0.28 16.31 x 0.67 0.67 URL arrow 15 x 25 0.18 x 0.28 0.41 x 0.67 0.41 Back/send buttons 50 x 50 0.60 x 0.56 1.38 x 1.34 1.34 Close buttons panel 10 x 50 0.12 x 0.56 0.27 x 1.34 0.27 Scroll bar 13 x 100+ 0.16 x 1.13 0.36 x 2.67 0.36 Mail button 30 x 25 0.36 x 0.28 0.83 x 0.67 0.67 MSNBC menu button 90 x 20 1.08 x 0.22 2.49 x 0.53 0.53 Story title 200+ x 20 2.40 x 0.22 5.53 x 0.53 0.53 ZIP code entry 80 X 30 0.96 X 0.34 2.21 x 0.80 0.80 GO button 30 x 30 0.36 x 0.34 0.83 x 0.80 0.80 Mail window 740 x 740 8.88 x 8.35 20.46 x 19.78 19.78 “T02” field 560 x 20 6.72 x 0.22 15.48 x 0.53 0.53 “Subject:” field 480 x 30 5.76 x 0.34 13.27 x 0.80 0.80 text entry area 700 X 460 8.40 x 5.19 19.35 x 12.30 12.30 Select panel 20 x 20 0.24 x 0.22 0.54 x 0.53 0.53 Table E.1: Sizes of typical buttons in the Netscape browser and mail windows, and a typical news WWW page 267 num. num. exper. time % skipped Subj Sess err turn off types mm:ss finish sub-tasks 00 1 1 0 0 7:00 80 not done 2,3,4,5T 2 0 0 0 3:34 100 01 1 2 0 0 2:50 84 not done 2,3,41 2 4 0 0 3:24 100 02 1 10 0 1 13:52 100 2 2 0 0 7.25 90 not done 2.4.5T 03 1 9 O 1 12:39 100 2 4 0 0 8:47 100 04 T 10 0 0 7:23 jOO 2 14 1 0 9:02 100 05 1 11 2 0 7.36 82 skip 5,6 2 5 0 1 12:11 70 skip 3,4,5,6 06 1 18 2 0 12:27 82 skip 2,6 2 10 0 0 9:50 100 07 1 10 l 1 15:27 88 skip 3,4 2 3 0 0 11:10 100 08 I 2 0 0 9:34 jOO 2 6 0 0 5:24 100 09 1 7 0 0 7:4 700 2 1 0 0 5:20 100 10 1 19 0 1 12:16 86 skip 6 2 4 0 0 9:49 100 11 1 1 O 0 3:28 100 2 4 0 0 4:41 100 12 1 18 0 1 13:00 40 skip 6,7,8,10 2 5 1 0 7:58 100 13 1 12 1 O 10:16 20 skip 1,6,7,8,9,10 2 8 1 1 17:45 86 skip 6t 15 1 0 0 O 3.58 100 2 2 0 0 8:46 86 Skip 61 16 1 11 1 O 11:03 82 skip 5,6 2 3 1 0 6:54 100 17 1 12 1 0 10:33 86 skip 6 2 18 0 1 13:08 100 18 I T 0 0 4:26 100 2 0 O 0 9:28 100 20 0 0 0 2:40 100 with the mouse 1:15 100 Sum 154 8 5 2 93 4 3 Av I 8.6 0.4 0.3 9:12 85 g 2 5.2 0.2 0.2 8:35 96 Percentage improvement: # of errors 40.6%, elapsed time 6.6% T - these subjects’ task did not include some sub-tasks due to time constraints, but they completed 100% of their current task. i - these subjects skipped the task due to network problems, but would otherwise attempted it. Table E2: Task T5: number of errors and timing statistics 268 BIBLIOGRAPHY 269 Bibliography [1] Applied science laboratories. [Online] Available http://www.a-s-1.com/. [2] Blue eyes project. [Online] Available http : //www . almaden. ibm . com/cs/blueeyes/. [3] Iscan incorporated. [Online] Available http://www.iscaninc.com/. [4] Lc technologies, inc. [Online] Available http://www.eyegaze.com/. [5] N ewabilities system inc. [Online] Available http://members.aol.com/UCSlOOO/home.htm. [6] Tracker 2000. [Online] Available http://www.tnadentec.com/. [7] V. Bakic and K. Miller. Using neural networks to recognize facial expressions. Technical Report MSU-CPS-98-5, Department of Computer Science, Michigan State University, 1998. [8] V. Bakié and G. Stockman. Menu selection by facial aspect. Pro- ceedings of Vision Interface, pages 203—209, 1999. [Online] Available http://www.cse.msu.edu/“bakicvel/faces/. [9] L.-P. Bala, K. Talmi, and J. Liu. Automatic detection and tracking of faces and facial features in video sequences. Proceedings of Picture Cod- ing Symposium, September 1997, Berlin, Germany. [Online] Available http://atwww.hhi.de:80/"b1ick/Papers/papers.html. [10] P. Ballard and G. Stockman. Controlling a computer via facial aspect. IEEE Transactions on System, Man and Cybernetics, Vol. 25, No. 4, pages 669—677, April 1995. [11] S. Baluja and D. Pomerleau. Non-intrusive gaze tracking using artificial neu- ral networks. Technical Report CMU-CS-94—102, Carnegie Mellon University, School of Computer Science, 1994. [12] P. A. Beardsley. A qualitative approach to classifying head and eye pose. Pro- ceedings of Workshop on Application of Computer Vision, pages 208-213, 1998. 270 l13l [14] [16] [17] [18] [19] [23] [‘24] G. R. Bradski. Real time face and object tracking as a component of a per- ceptual user interface. Proceedings of Workshop on Application of Computer Vision, pages 214—219, 1998. C. Buquet, J. R. Charlier, and V. Paris. Museum application of an eye tracker. Medical and Biomedical Engineering and Computing, Vol. 26, pages 277~281, 1988. G. Burel and D. Carel. Detection and localization of faces on digital images. Pattern Recognition Letters, Vol. 15, pages 963—967, 1994. S. K. Card, T. P. Moran, and A. Newell. The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1983. J. Carletta. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics, Vol. 22, No. 2, pages 249-254, 1996. [Online] Avail- able http://www. iccs . informatics.ed.ac.uk/"jeanc/squib.ps. J. M. Carroll and J. Reitman Olson. Chapter 2: Mental Models in Human- Computer Interaction, pages 45—65. In Helander [36], 1988. J. M. Carroll and J. C. Thomas. Metaphor and the cognitive representation of computing system. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 12, pages 107—116, 1982. R. Cipolla and A. Pentland, editors. Computer Vision for Human-Machine Interaction. Cambridge University Press, 1998. A. Colmenarez, B. Frey, and T. S. Huang. A probabilistic framework for em- bedded face and facial expression recognition. Proceedings of Computer Vision and Pattern Recognition, Vol. I, pages 592—597, 1999. H. D. Crane. The purkinje image eyetracker, image stabilization, and related forms of stimulus manipulation. In D. H. Kelley, editor, Visual science and engineering: Models and applications, pages 15—89. Macel Dekker, New York, 1994. H. D. Crane and C. M. Steele. Generation-v dual-purkinje-image eyetrcker. Applied Optics, Vol. 24, pages 527—537. D. F. DeMenthon and L. S. Davis. Model based object pose in 25 lines of code. International Journal of Computer Vision, Vol. 15, pages 123—141, June 1995. [Online] Available http : //www . cfar . umd . edu/ "daniel/. [25] B. DuBoulay, T. O’Shea, and J. Monk. The black box inside the glass box: Pre- senting computing to novices. International Journal of Man-Machine Studies, Vol. 14, pages 237—249, 1981. 271 [26] I. Essa and A. Pentland. A vision system for observing and extracting facial action parameters. Proceedings of Computer Vision and Pattern Recognition, pages 76—83, 1994. [27] Scott E. Fahlman. An empirical study of learning speed in back-propagation networks. Technical Report CMU-CS-88—162, Carnegie Mellon University, School of Computer Science, 1988. [28] J. D. Foley, A. van Dam, S. K. Feiner, J. F. Hughes, and R. L. Phillips. Intro- duction to Computer Graphics. Addison-Wesley Publishing Company, 1993. ['29] W. T. Freeman, D. B. Anderson, P. A. Beardsley, C. N. Dodge, M. Roth, C. D. Weissman, W. S. Yerazunis, H. Kage, K. Kyuma, Y. Miyake, and K.-I. Tanaka. Computer vision for interactive computer graphics. IEEE Computer Graphics and Applications, Vol. 18, No. 3, pages 42—53, May-June 1998. [30] L. A. Frey, JR. K. P. White, and T. E. Hutchinson. Eye-gaze word processing. IEEE Transactions on System, Man and Cybernetics, Vol. 20, N0. 4, pages 944-950, July/ August 1990. [31] A. Gee and R. Cipolla. Non-intrusive gaze tracking for human-computer in- teraction. Proc. Mechatronics and Machine Vision in Practice, pages 112—117, 1994. [Online] Available http://www.diku.dk/"panic/eyegaze/. [32] A. J. Glenstrup and T. Eugell-Nielsen. Eye controlled media: Present and future state. June 1995. [Online] Available http://www.dicu.dk/'panic/eyegaze/. [33] V. Govindaraju, D. Sher, and R. Srihari. Locating human faces in newspaper photographs. Proceedings of Computer Vision and Pattern Recognition, pages 549—554, 1989. [34] V. Govindaraju, R. Srihari, and D. Sher. A computational model for face lo- cation. Proceedings of 3rd International Conference on Computer Vision, pages 718-721, 1990. [35] J. S. Greenstein and L. Y. Arnaut. Chapter 22: Input Devices, pages 495—519. In Helander [36], 1988. [36] M. Helander, editor. Handbook of Human-Computer Interaction. Elsevier Sci- ence Publishers, North-Holland, 1988. [37] J. M. Henderson and A. Hollingworth. Eye movements during scene viewing: An overview. In G. Underwood, editor, Eye Guidance while Reading and While Watching Dynamic Scenes, pages 269—293. Elsevier Science Publishers, Oxford, 1998. [38] J. M. Henderson, Jr. P. A. Weeks, and A. Hollingworth. Effects of semantic consistency on eye movements during scene viewing. Journal of Experimental Psychology: Human Perception and Performance, Vol. 25, pages 210—228, 1999. 272 [39] [401 [42] [431 [451 [46] [47] [48] A. M. F. Heuvelmans, H. E. M. Mélotte, and J. J. Neve. A typewriting system operated by head movements, based on home-computer equipment. Applied Ergonomics, Vol. 21, No. 2, pages 115—120, June 1990. T.E. Hutchinson, Jr. K.P. White, W.N. Martin, K.C. Reichert, and LA. Frey. Human-computer interaction using eye-gaze input. IEEE Transactions on System, Man and Cybernetics, Vol. 19, N0. 6, pages 1527—1534, Novem- ber/ December 1989. R. J. K. Jacob. Eye movement-based human-computer interaction techniques: Toward non-command interfaces. Advances in Human- Computer Interaction, Vol. 4, pages 151—190, 1993. [Online] Available http://http://www.cs.tufts.edu/"jacob/papers.html. T. Jebara and A. Pentland. Parameterized structure from motion for 3D adap- tive feedback tracking of faces. Proceedings of Computer Vision and Pattern Recognition, pages 144—150, 1997. R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME—Journal of Basic Engineering, pages 35—45, March 1960. A. E. Kaufman, A. Bandopadhay, and B. D. Shaviv. An eye tracking computer user interface. Research Frontiers in VR Work- shop Proceedings, pages 120—121, October 1993. [Online] Available http://www.cs.sunysb.edu/"vislab/projects/eye/. D. E. Kieras and P. G. Polson. An approach to the formal analyss of user complexity. International Journal of Man-Machine Studies, Vol. 22, pages 365— 394, 1985. K.-N. Kim and RS. Ramakrishna. Vision-based eye-gaze tracking for human computer interaction. IEEE International Conference on System, Man and Cybernetics, Vol. 2, pages 324—329, 1999. S. Kimura and M. Yachida. Facial expression recognition and its degree estima- tion. Proceedings of Computer Vision and Pattern Recognition, pages 295—300, 1997. Jr. K.P. White, T.E. Hutchinson, and J.M. Carley. Spatially dynamic cali- bration of an eye-tracking system. IEEE Transactions on System, Man and Cybernetics, Vol. 23, No. 4, pages 1162—1168, July/August 1993. Y. Kuno, T. Ishiyama, S. Nakanishi, and Y. Shirai. Combining observations of intentional and unintentional behaviors for human-computer interaction. Pro- ceedings of CHI—Human Factors in Computing Systems, pages 238—245, 1999. Y. H. Kwon and N. da Vitoria Lobo. Age classification from facial images. Proceedings of Computer Vision and Pattern Recognition, pages 762—767, 1994. 273 [51] R. A. Miller. A systems approach to modeling discrete control performance. In W. B. Rouse, editor, Advances in Man-Machine Systems Research, Vol. 2. JAI Press, Greenwich, Connecticut, 1985. [52] M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969. [53] T. P. Moran. The command language grammar: A representation for the user interface of interaction computer systems. International Journal of Man- Machine Studies, Vol. 15, pages 3—50, 1981. [54] C. Morimoto, D. Koons, A. Amir, and M. Flickner. Pupil de- tection and tracking using multiple light sources. [Online] Available http://www.almaden.ibm.com/cs/b1ueeyes/. 5] D. P. Mukherjee, A. Zisserman, and M. Brady. Shape from symmetry—— detecting and exploiting symmetry in affine images. Technical Report OUEL 1988/ 93, Oxford University Department of Engineering Science, June 1993. [56] J. Newell. The division of attention: A social learning approach to the interac- tion of internet and television use. Technical report, Michigan State University, Telecommunication Department, December 1999. (unpublished). [57] J. Nielsen. Noncommand user interfaces. Communications of the ACM, Vol. 36, No. 4, pages 83—99, 1993. [58] K. Ohmura, A. Tomono, and Y. Kobayashi. Method of detecting face direction using image processing for human interface. SPIE Visual Communication and Image Processing, Vol. 1001, pages 625—632, 1988. [59] N. Oliver, A. Pentland, and F. Bérard. LAFTER: lips and face real time tracker. Proceedings of Computer Vision and Pattern Recognition, pages 123—129, 1997. [60] E. Osuna, R. Freund, and F. Girosi. Training support vector machines: an application to face detection. Proceedings of Computer Vision and Pattern Recognition, pages 130—136, 1997. [61] S. J. Payne and T. R. C. Green. The user’s perception of the interaction language: A two-level model. Proceedings of CHI-Human Factors in Computing Systems, 1983. [62] A. Pentland, B. Moghaddam, and T. Starner. View-based and modular eigenspaces for face recognition. Proceedings of Computer Vision and Pattern Recognition, pages 84—91, 1994. [63] D. A. Pomerleau. Neural Network Perception for Mobile Robot Guidance. Kluwer Academic Publishing, 1993. [64] F. K. H. Quek. Unencumbered gestural interaction. IEEE MultiMedia, pages 36—47, Winter 1996. 274 [65] [66] [68] [69] [70] [71] [72] [73] [74] [75] [76] P. Reisner. Formal grammar and human factors design of an interactive graphics system. IEEE Transactions of Software Engineering, Vol. SE-7, pages 229—240, 1981. P. Reisner. Formal grammar as a tool for analyzing ease of use: Some funda- mental concepts. Proceedings of CHI-Human Factors in Computing Systems, page 53, 1984. LG. Roberts. Machine perception of three-dimensional solids. In J .T. Tippet, editor, Optical and Electro-Optical Information Processing, Cambridge, MA, 1995. MIT Press. F. Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, Vol. 65, pages 386—408, 1959. H. Rowley, S. Baluja, and T. Kanade. Human face detection in visual scenes. Technical Report CMU-CS-95—158R, Carnegie Mellon University, School of Computer Science, 1995. H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. Proceedings of Computer Vision and Pattern Recognition, pages 38-44, 1998. D. E. Rumelhart and J. L. McClelland, editors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA, and London, England, 1986. T. Sakai, M. Nagao, and S. Fujibayashi. Line extraction and pattern detection in a photograph. Pattern Recognition, pages 233-248, 1969. DD. Salucci. Inferring intent in eye-based interfaces: Tracing eye movements with process models. Proceedings of CHI—Human Factors in Computing Sys- tems, pages 254—261, 1999. I. Starker and R. A. Bolt. A gaze—responsive self-disclosing display. Proceedings of CHI—Human Factors in Computing Systems, pages 3—9, 1990. R. Stiefelhagen, J. Yang, and A. Waibel. A model-based gaze tracking sys- tem. IEEE International Joint Symposia on Intelligence and Systems: Image, Speech and Natural Language Systems, pages 304—310, 1996. [Online] Available http://werner.ira.uka.de/ISL.publications.html. H.K. Suen and D. Ary. Analyzing Quantitative Data. Lawrence Erlbaum Asso— ciates, Hillsdale, New Jersey, 1989. [77] K. Sung and T. Poggio. Example-based learning for view-based human face detection. Technical Report 1521, MIT A.I. Lab, December 1994. 275 [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [891 D. Swets and J. Weng. Using discriminant eigenvectors for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 8, pages 831~836, August 1996. K. Talmi and J. Liu. Eye and gaze tracking for visually controlled inter- active sterosc0pic displays. Image Communication, 1998. [Online] Available http://atwww.hhi.de:80/‘blick/Papers/papers.html. J .-C. Terrillon and S. Akamatsu. Comparative performance of different chromi- nance spaces for color segmentation and detection of human faces in complex scene images. Proceedings of Vision Interface, pages 180—187, 1999. K. Toyama. Head parallax tracking for control of a virtual space: a comparison of algorithms. IEEE International Conference on System, Man and Cybernetics, Vol. 6, pages 1—6, 1999. M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, Vol. 3, No. 1, pages 71—86, 1991. R. Uhl and N. da Vitoria Lobo. A framework for recognizing a facial image from a police sketch. Proceedings of Computer Vision and Pattern Recognition, 1996. R. Vaillant, C. Monrocq, and Y. Le Cun. Original approach for the localization of objects in images. IEEE Proceedings on Vision, Image and Signal Processing 141 (4), August 1994. M. T. Vo and A. Waibel. Multimodal human-computer in- teraction. Proceedings of ISSD, 1993. [Online] Available http://werner.ira.uka.de/ISL.publications.html. C. Ware and H. H. Mikaelian. An evaluation of an eye tracker as a device for computer input. Proceedings of CHI—Human Factors in Computing Systems, pages 183—188, 1987. M. Weiser. Some computer science issues in ubiquitous computing. Communi- cations of the ACM, Vol. 36, No 7, pages 74-83, July 1993. In Special Issue, Computer-Augmented Environments. M. Weiser. Hot tOpic: Ubiquitous computing. IEEE Computer, pages 71—72, October 1993. G. Welch and G. Bishop. An introduction to the kalman filter. Technical Report TR 95—041, Department of Computer Science, University of North Carolina at Chapel Hill, 1995. [Online] Available http://www.cs.unc.edu/“welch/. [90] A. T. Welford. Fundamentals of Skills. Methuen, London, 1968. 276 [91] [9?] [93] [94] [95] X. Xie, R. Sudhakar, and H. Zhuang. A cascaded scheme for eye tracking and head movement compensation. IEEE Transactions on System, Man and Cybernetics, Vol. 28, No. 4, pages 487—490, July 1998. Y. Yacoob and L. Davis. Computing spatio—temporal representations of human faces. Proceedings of Computer Vision and Pattern Recognition, pages 70—75, 1994. G. Yang and TS. Huang. Human face detection in a complex background. Pattern Recognition, Vol. 27, No. 1, pages 53—63, 1994. J. Yang, W. Lu, and A. Waibel. Skin-color modeling and adaptation. Proceed- ings of Asian Conference on Computer Vision, Vol. 11, pages 687—694, 1998. [Online] Available http : //werner . ira.uka.de/ISL.publications .html. J. Yang, R. Stiefelhagen, U. Meier, and A. Waibel. Visual tracking for mul- timodal human computer interaction. Proceedings of CHI—Human Factors in Computing Systems, pages 140—147, 1998. J. Yang and A. Waibel. A real-time face tracker. Pro- ceedings of WACV, pages 142—147, 1996. [Online] Available http://werner.ira.uka.de/ISL.publications.html. A. L. Yarbus. Eye movements during perception of complex objects. In L. A. Riggs, editor, Eye Movements and Vision, pages 171*196. Plenum Press, New York, 1967. R. M. Young. Surrogates and mappings: Two kinds of conceptual models for interactive devices. In D. Gentner and A. Stevens, editors, Mental Models. Erlbaum, Hillsdale, New Jersey, 1983. A. L. Yuille, D. S. Cohen, and P. W. Hallinan. Feature extraction from faces us- ing deformable templates. Proceedings of Computer Vision and Pattern Recog- nition, pages 104—109, 1989. S. Zhai, C. Morimoto, and S. Ihde. Manual and gaze input cascaded (magic) pointing. Proceedings of CHI—Human Factors in Computing Systems, pages 246—253, 1999. 277