THESI':

ZOCI

This is to certify that the

dissertation entitled

AN INTERFACE FOR
HUMAN-COMPUTER INTERACTION
BASED ON FACE FEATURE TRACKING
IN 2D

presented by

VERA BAKIC

has been accepted towards fulﬁllment
of the requirements for

PhD degree in Computer Science

& Engineering

 

Major professor

Date 23/91/277 2000

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

LIBRARY

Michigan State

University

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BERECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

03,910 1’1 5 23033

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ix lNTERHC
BASED 0

AN INTERFACE FOR HUMAN-COMPUTER INTERACTION
BASED ON FACE FEATURE TRACKING IN 2D

By

Vera B akic’

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science and Engineering

2000

it ixtzirirr Ix.

 

 

 

 

 

. ‘ ‘
'r|_ M . . ‘
L‘ \\»'4‘
Ni ‘.4 . _'
. A
' o
. i‘
.
' V
i“ .A
._ ‘ ‘
"..
o.
. It I\ A v “ '
.... 4|. ‘ .
‘l “ I h ‘
A
a. .
l .
' ‘ '0 ‘
“N“NI‘G b . ' ‘
A”. ‘
'u. 0‘ ‘ . '
‘ ' r O'Vq. ‘ '
‘I
“h, .\ l:'\,
. _ L
". _ ‘
‘ ‘r.> ‘
\"s ,’~- ’ .'
‘I‘ J l‘ . '
A
t‘ .\
V7. ‘
\ .
. " .
i v
" 4. '. ‘
‘4
u“ .‘. .
.‘
'.
._ m. ‘
‘. 4'4} '
«“V. d 1‘, '
.“I.
. ‘ '
. .
‘~I ‘
i i
‘T ‘ v~, ..
“ L i. _ .
~ I
I‘M
“l.l
0
'-
t .
. I
. C
ll
..
§
.
\ ‘.
\,V’|>
‘v F
.4"
II,»
\ ~‘ﬂn
Q‘. A

 

ABSTRACT
AN INTERFACE FOR HUMAN-COMPUTER INTERACTION BASED ON
FACE FEATURE TRACKING IN 2D
By
Vera Bakic’

In this dissertation, a new approach to control a computer using head movements,
gaze direction and face expressions is discussed. A workstation user is observed in a
non-intrusive fashion. Algorithms to ﬁnd a user’s face and where the user is looking at
the workstation display are described. The user can issue a command to the computer
by making a certain facial expression. The following modules are described: (2') face
detection that is based on a skin-color model and motion, (it) face feature location
that is based on the knowledge of the face geometry, (III) gaze location detection that
iS based on an Artiﬁcial Neural Network mapping of face feature location to display
location and a mapping of mouse movement to display movement, and (to) selection
that is done through face expressions. In a computer implementation, real—time rates
are achieved, and between 10 and 30 frames per second can be processed.

Essentially, a handless Human-Computer Interaction (HCI) is provided that can

have various applications. Disabled people who cannot use their hands can greatly

 

 

'
II
,. '0 v. y 1 V
.lili ..‘ ‘A
,. u.
“"' ‘T' f"‘\.’, 4
: Lu. u. \nll in.
. I
’r 1-: ..
‘._’.” In .u. ll .
”I
“l
..
v I
v x" " P \o
“’ t" in . l. A
‘ z
W " "'4 I'I 0.
"’ I ‘ b.- _
I “ﬂy i‘ 1“ u,
0
v‘. . 9| ‘
: l . '. Y . ‘K
.IA. i» ‘1‘ A; .
H '. " ' v e,
1 ‘1‘ _' .
‘2'- AI. .I i. ‘Au...’
"""-wf ‘ ' "I
a 4. ‘V ‘ yll.'.:‘.. \“
~-..A
__ f“ 1. I." ‘
h K “ ‘.u-'

\"era Bakic
beneﬁt from such an HCI. Other users can use the interface as a replacement for a
conventional mouse. Finally, the interface can be adapted to monitor where the user
is looking in time, and the results could be used for various analysis, e.g., in cognitive
science for visual perception analysis, in marketing for display exploration analysis,
etc.

The results Of a study that evaluated the usability Of the head-eye input. interface
are presented in the ﬁnal Chapter. These results show that the users were able
to adjust to the new interface and perform a range of tasks common for current
application programs. Among other tasks, the users successfully controlled Netscape:
they browsed several WWW pages with an average of 5 erroneous selections and 8:35

minutes of time.

© Copyright 2000 by Vera Bakié
All Rights Reserved

TO My Family

 

 

 

M . '
. r _' X ‘ ‘
~ ~\\ ‘, . . “
..‘ ‘ ‘ 7
A ,_ .-
I‘D l '3‘ ‘. .
. ”n ' . ‘ .‘.‘
a. J . _
In“ V
b. 7 ,_ ' . ‘
‘r P‘II.‘ ‘ \I
~ «s.- “‘>.‘
‘ v y
‘ A. l.
.‘ ' r.‘
~ n, Y ‘ .1.
. A! ’ ‘
.. .uq“ u.
‘1
R~.‘r.
4 v ‘ . ‘
A . ‘ .
\«O- 'A‘.‘ -“‘
N.
l . “ -
. a
H. 7 ~' l' \7‘ 1.
v-nh .1“ HA . -. ..‘
l ._
h, . .
. ‘I-v '
s41:."\ _-H\ J \
t“ h
F
l I
b
|‘ . r“- F:
s... :‘ ‘b—I ‘V..'
A A4 ‘ ‘.. .
.
u
l‘ 'l V H
s., ‘|' ,‘ :j ‘
‘14! .
A L-‘.
‘3' . I." .'
T l I ‘I .
\‘k‘ “J; :T
n ‘. u
s
..
.. r; a
u . '
Iv
-l~.|> r. _‘ \
"‘l l
._ .‘
I‘-
a
l' , ‘l .
I
.‘AAI ‘l .
\ ~ ‘ .

ACKNOWLEDGMENTS

This thesis would not be possible without the help and support of many people:

Dr. George Stockman offered excellent guidance, advice and support throughout
my PhD research. I am thankful to him for the inspiration to pursue the degree in

the computer vision ﬁeld.

Dr. Frank Biocca from Telecommunication Department served in my guidance
committee and gave many valuable advices for my work. Dr. Anil Jain, Dr. John
Weng and Dr. Sridhar Mahadevan served in my guidance committee and made many

valuable suggestions and comments.

Dr. Charles Owen helped with his comments. Dr. John Henderson from the

Psychology Department helped with his expertise in the eye-tracking research.

Dr. Richard Enbody was my advisor during the Masters degree program and I
thank him for guidance and support. I would like to thank Dr. Alvin Smucker from
the Plant and Soil Science Department and Dr. Theresa Bernardo from the School
Of Veterinary Medicine for their support during various research projects.

vi

 

 

I ‘l ,I- 9 ' l. I
m‘ t. I i,.‘st.»-
5“". .ul
5
I ‘V'
'_'. v V "' l
‘ 4 ‘ .‘ l4 Anat—
:;-.-~ “" .
o l . , .0
.a l "A "' Viv
m". ..: rm -
.,__-,.,.a. .
,, . - '
l “11,". o O.
l . ‘I
‘ u “t. “It. \- ‘-
_o I
. V ‘
, "J HY I‘lr ,0
'__ _ ‘3 . .I It -
-lru~ : u u
. - o .
.. p _' T} ‘J ' . "
.u“ “0| a . on at Ac
7. V". V ‘1v
‘ . u,
. . 'V 4 - .4. \
.g. .uJA .h -
‘ l
. f
1. ' "“ ' V
' v" .. s- A...”
.I
- _,.. " 'r ‘r I
- ._ ;
l
v - .1,
w .‘ . .‘ .
\l 1Q. “i, H‘
‘ I
I) I -
_ ' v. ,‘ ‘
~_ "‘1 '
“ ‘r 9...; I
.
- IQ
. v H I l .
. . ... _
‘WA..'. A ﬁt ,
A.
9. ”..'.' .
. .'
", —.. -I‘ . ‘ ,v‘
a
”‘ SKA.
.
I.
q.
\ v
4‘ ‘.

 

I would like to thank all the other professors at the CSE department for excelent
teaching and providing research facilities. I would like to thank all the CSE staff,
Linda Moore in particular, for all their help.

I would like to thank CSE undergraduate students Kristina Johnson and Brian
Langner for their help during CSE Open House demos; fellow students from the
PRIP lab for all their help: Aditya Vailaya, Salil Prabhaka, Vt’ey Shuan Huang, Arun
Ross, Paul Albee, Scott Connell, Nicolae Duta, Friederike Griess, Karissa I\~’Iiller,
Dan Gutchess, Erin McGarrity, and many others. Jay Newell and Jerry Roll from
the Telecommunication Department helped with the “eyes-on-screen” experiments.

Special thanks to all the people who participated in the experiments. Without
their help and cooperation this work would not be possible.

Finally, I would like to thank my family for their continuing support and help,
my mother in particular for helping out in taking care of my daughter while I was

ﬁnishing the dissertation.

vii

Jl OF TABLES
LEI OF MUN

I Introduction

.1 ‘ ' ‘

7v v, 04 0

I ~ ' \-
It nahkot a... ~-

I:
,. " rl ‘

..

. v 9 I

r ,p" \ . ,4
so A. A. Ill ». A -
.u. TV

[HI-n r i v.
0.. u...uuu- ' ...

.. s”.

. I
to». \.';..A I". ‘5‘

\l '.
.utt 4 u..
l
‘1 "I . r‘v
1' v? v ‘
.-. . .., "5 CH". '
A
l ‘9
1 .
., 'r ' .7 7". .
' .~-La...&u . 'h '

3' Background 3
7" 1‘.- Y _ '.

a.

., f“

Inv..

 

..J ‘,_v. "
'L.‘ f
. "AA _\
5] r ‘
Q ~
.5. P]... I
‘a I‘ ~..
~.
;‘ L”...
t; r‘. ‘
. ,, I
k!” Q
A. p ‘
.. "aiu,’
‘.l‘ '~ "
.
.' 5..
Qt“. ,‘
, ‘ \ .‘ .
3‘ r H t
I1 I I
u .. ~
If, ’
r'r .'
3' l“‘ i
.‘l ‘v
' .b- 4 ft
"(L -l .
. 4 I 7 h
v. ‘ -.,
.I g. y
I \"3 ."
Ix
AG. . L".
L.

TABLE OF CONTENTS

LIST OF TABLES xi
LIST OF FIGURES xiii
1 Introduction 1
1.1 Applications of Computer Vision ...................... 2
1.2 Problem Statement and Motivation ..................... 2
1.2.1 Human-Computer Interaction ....................... 3
1.2.2 Cognitive Aspects ............................. 5
1.2.3 Computing Challenges ........................... 7
1.3 Proposed Gaze Tracking System Organization ............... 10
1.4 Organization of the Dissertation ....................... 10
2 Background and Related Work 14
2.1 Face Localization ............................... 14
2.1.1 Clustering, PCA and Rule Based Approaches .............. 15
2.1.2 Artiﬁcial Neural Network Based Approach ................ 16
2.1.3 Skin Color Based Approach ........................ 18
2.2 Facial Feature Finding ............................ 19
2.3 Facial Expression Discrimination ...................... 22
2.4 Gaze Direction Detection ........................... 24
2.4.1 Mathematical Solution ........................... 24
2.4.2 Pupil Reﬂections .............................. 25
2.4.3 Limbus Tracking .............................. 27
2.4.4 Artiﬁcial Neural Network Estimation ................... 27
2.4.5 Electro-Oculography ............................ 27
2.5 Kalman Filter ................................. 28
2-6 HCI-Related Issues .............................. 30
2.6.1 Basic 2D Input Devices .......................... 30
26.2 Interaction Tasks .............................. 35
26.3 Mental Models in Human-Computer Interaction ............. 36
2.6.4 Fitts’ Law .................................. 38
2.6.5 Evaluation of Eye-Tracker as Input Device ................ 40
2.6.6 Eye-Controlled Systems .......................... 41
2.6.7 Problems with Eye-Tracking Systems ................... 44
26-8 Head-Controlled Systems ......................... 46
2-7 Visual Perception Studies .......................... 47

viii

i
“‘ 7,, {FVI\\
' I 131*? v ..,~
9 : ,
a ‘ ' . r
I ﬁstt-M-ill'ﬂi.

3 late and facial l‘

' P' "'1

V 'ﬂ‘t“ " - u

.. e‘ u ( ‘
“I '
. \L‘; l ‘ll'lb J. .‘
31.. v . .

I
. . n I‘

:14 I“

I“ I 'l
v '- ' ‘ 1‘
.I n o l n. .
‘ V
." nt'l.t"l'. '.IL
. I Alan} 6 ~\ Ill AA \ I
I

II‘ F. !~
0 o . ,
kt‘i '. L‘.\
.'a .u-
I“ 9 7‘
“‘ ' l, v - L-
" . .u ‘
J. Jab oa...g .-
n '~
‘I l 'v q u 5
‘ D ‘ y I I .
" ‘ “ ‘ a s In
at

‘
\""‘n 1"-

~ 1...... \L. V

i lacking the Fe;

'- . a

' ' V ’ \I o y.
n l ,
‘- Lulu.“ _. .\ I -..
. ,.
-

».,.. ll . r
1. .‘ - , I ( .

..u.~ , . u u ‘

F “I . :‘ ' '
‘ It. I . 4. ‘
-. . \ \ ‘ .

».4 \ - .m. .k:
.‘.I n P '
.J H 1‘ I.' .T h I
‘ ' .A,‘9 .6. ..

Aao

' I ' - .
' - .. . ‘1
O‘ i . .
" ~L3.I‘;_‘ .I‘

‘ .- h‘ H'
"d d
" ‘kAAN A‘ I

Gaze Direction

"o..“-H .
a“ ‘ '9. '
Hm.“ “-1.12“
It
.,
. 4
A III IV”. '
" um. _. N
l'i rv
.5. i .
~J ‘lr'r ‘vk .. '
vw. ~l“ 1'“
' N
.. \w- .M' ‘ :,
.
V
A.._m~ L p
., _
'n - I
~“. ‘
. ‘Iw '
‘ nus _
.,' g
‘ L'“ a.
"1 in .' r ‘
‘n‘. \ I,
L.
...
1.,
.. f. ,_v-. ’v. \I
‘u -I_'
._‘

'-.- f"\ , \l ‘
\T‘ I I ‘p
'.
‘r
J. \"n,.' :‘h
..
T‘“--:.\

:I.‘
:23
co
"8?
a
o
:1

I. “ .
‘r-a-o , \r .
~ p I . '
. ‘ Q. \
~ ‘ at
I ' 5‘
a. \Lgo. v '
V .
‘ 'I_"‘ I.
‘ .
. N ' I
‘5 ELM... ' I .
' ‘I ‘-.-A -I.
,1. {_ . .
i
a u v .
4', I’D? “ ‘.
.._‘ "(1“
‘l
0' _ .
"F
'v v .. .
"-1 Ll‘ﬁ :v T
. .. 1“
.
. I, v
C

. D‘,' t"
4 .I \u,
4 'L. “[-.4‘
.
I
..‘ ‘..__
-.-.4"f.-,‘
A“ o;‘\ 'l
u -, .5
»~. \
‘4

2.7.1 Types of Eye Movement .......................... 48

2.7.2 State—of-the—art Eye Movement Tracking Technology .......... 50
3 Face and Facial Feature Detection 51
3.1 Face Detection ................................ 52
3.1.1 Skin Color Model in RGB Color Space and F ace Localization ..... 52
3.1.2 Skin Color Model in HSI Color Space ................... 55
3.1.3 Problems with Skin Color Model ..................... 56
3.1.4 Problems with Dark Skin Color ...................... 58
3.2 Improvement of Dark Skin Color Detection ................. 59
3.3 Eyes, Eyebrows and Nose Location ..................... 62
3.3.1 Detecting Eyes ............................... 62
3.3.2 Matching Eyes with Eyebrows and Nose ................. 65
3.4 Accuracy of Feature Detection ........................ 69
3.5 Summary ................................... 72
4 Tracking the Features 74
4.1 Tracking System State Diagram ....................... 75
4.2 Using Motion Detection to Improve Tracking Results ........... 78
4.3 Results Using Kalman Filter ......................... 81
4.3.1 How far from the camera can the user be? ................ 85
4.4 Analysis of Movement Data ......................... 86
4.5 Summary ................................... 88
5 Gaze Direction Detection 89
5.1 Introduction and Terminology ........................ 89
5.2 Gaze Direction Determination Using ANN ................. 91
5.3 Gaze Direction Determination Using Movement Vector Scaling ...... 94
5.4 Smoothing of Gaze Point ........................... 97
5.5 Results ..................................... 98
5.5.1 Fixations Determination .......................... 98
5.5.2 Attention Measurement .......................... 103
5.5.3 Cursor Movement .............................. 112
5.6 Summary ................................... 115
6 HCI Based on the Head-Eye Input 116
6.1 Selection Mechanism ............................. 117
6.1.1 Selection by Head Motion ......................... 118
6.1.2 Selection by Facial Expression ....................... 121
6.1.3 Other Ways to Make a Selection ..................... 123
6.2 GUI Design Issues .............................. 124
6.2.1 Level of Feedback .............................. 124
6.2.2 Button Sizes ................................ 126
6'3 Advantages and Disadvantages of the Head-Eye Input Interface ..... 130
6-4 Summary ................................... 133
ix

T imitation of ill!
Th G I..~ Hi '13

O.‘ ‘\ ..
. \-"ql P4!“
.I; . .

-i El'ai'titxnl. l‘: . .
'4 3: ~. .

3:3 3m“ ‘33 ‘Iii

3i: TI 32' ii '.'_
3 '3. G'

I_‘ T ' " v
m In. "I:
I’. Q 0.:
i uln A I u
‘ .8, ‘ ‘

if 2M” f.
. .I “.1qu
F ‘ '
L ft'l‘e ii r.

IPPENDICES

ﬂanks of the
Biask 11 C om;
flask 12 Com]
[111313 Com]
l lair 15 C 0m;

7 Evaluation of the Head-Eye Input Interface

7.1 The Goals of the Study ..........................
7.2 Subject Pool ...............................
7.3 Evaluation Procedure ...........................
7.4 Results ...................................
7.4.1 Task T1: Moving the Cursor Along a Path on the Display .....
7.4.2 Task T2: Guided Button Selection ..................
7.4.3 Task T3: Guided Dragging of Icons ..................

7.4.4 Task T5: Application ControlmNet-surﬁng and E—mail Writing

7.4.5 Summary of Questionnaires ......................
7.5 Discussion of T1, T2, T3, T5 Results ..................
7.5.1 Learning curve .............................
7.6 Summary .................................

8 Conclusion and Future Work

8.1 Research Contributions ..........................
8.2 Future Work ...............................

APPENDICES

A Results of the Gaze Monitoring Experiment
B Task T1 Complete Results

C Task T2 Complete Results

D Task T3 Complete Results

E Task T5 Complete Results

BIBLIOGRAPHY

134
135
136
137
143
144
147
156
164
172
174
177
181

183
184
186

188

189

200

221

243

266

269

.
..
[7... .

V .
I I ‘
hu.-I"I II“ .

\

:."I ..1 ..,.
i H... ,_, . .1
l I ‘l p
4» um I ”_
' 1
I I

t“ ‘D I'
hU'I-l I“. A

 

A.
'

i; .. 9", . ..~

" {N l.\‘...

l - .

1' .6 ,. I

..’ ‘ AA

0‘. .J tal‘ 8...
‘- r
e, w ‘ '
. ,. .
“u.“H ... I
"\5« I '-‘ _
\ P ' .‘_

.4. 'J- ' .
D .
'“"~V":. I
. .o u 34“ w.

\‘A ‘ H e.‘

c "-

\’ . . .

|.,'V..

I ' . '

A ‘Ia...’.‘j. I." “ .‘
i""‘< l'
Mu-“ IKA.

-.

' !., ~
y.

a"... h“ A A

.‘A ‘4‘ ‘
..

l I n

3" ' a‘. ‘s

 

' 1
Us“) . .
“I, . ‘ ’-
‘L‘ 3.1
i.
J‘II'VG.‘ y
u '3’“.-
‘9‘{¢:0._'
M'A
‘ . N\
‘
4.,
I.
‘1; L-"/‘..
Q‘ ‘i." ‘
r“... _‘ '
La u o
o “.
‘v
6| I ‘
c
5 J G ‘ l
I .
s
V

- .
.«L 1') ’V
’ d.
?--‘,'
“.3: f- ‘. ‘
3"“
l
r -.
. h ~
..‘ l r _.1
l'a . .7

2.1

3.1

4.1

4.2
4.3

5.1

5.2

6.1

7.1
7.2

7.3

7.4
7.5

7.6

LIST OF TABLES

Resolution and size characteristics of common display devices ......

Accuracy of the feature location. Percentage of found matches and average
Euclidean distance of program-found features from hand—labeled loca-
tion. Comparative results for normal environment vs.controlled (plain
green) background are given ........................

Accuracy of the motion detection and dark face detection given as the
Euclidean distance between feature points and hand-labeled locations.
Euclidean distance is given in pixel units, and the number of matches
is in frame units. .............................

Accuracy of the tracked feature location ..................

Execution times for six runs (two subjects). The program was run for 1000
frames of video, on an SGI Indy 2 workstation with the input image
size 320 x 240 ................................

Percentage of agreement in viewing positions and Kappa statistics for all
subjects ..................................
Number of observations, their mean and standard deviation in seconds for
manual coding, and automatic coding ..................

Features and requirements for the head-eye input, eye-tracking based in-
put, mouse, trackball and touch-screen .................

Subject pool: details of the variations in facial complexities ........
Number of erroneous selections at the previous target location: “y” entries
mean that there were more than three selections at the previous target.
location, “11” entries mean that there were less than or equal to three
selections at the previous target ......................
Percentage of correct selections and the number of erroneous selections
per each correct target selection: data from the 8 x 8 grid is collapsed
into a 4 x 4 grid. .............................
Task T5: Assigned weights for each sub-task ................
Task T5: average number of errors, elapsed time and percentage of the
task completed ..............................

Distribution of errors across the sub-tasks .................

xi

71

110

111

131

137

155

155
168

. .4 .- l ,.
I‘ \uﬁir! Aliii 43131--

.4iq 7i? I "I
“1"“3‘IIH

f1 7‘ ' T . .
.- . ‘ .Y‘ 1‘ I

 

._ ~ I _ ,‘

. \w‘ ”-7 ~ I. I. y

.. .J'Ti'l IM’J I31 .‘. I‘,
.1

mi-‘ .. I" "
33.? I1; 33’“? \\ \3\

" ' I . r
v ‘- -~ "h , '
-1 .551 I}. 11““ 'I | O

A1

8.1

C.1
C.2

D.1
D.2

E1

E2

Number and durations of ﬁxations measured by the gaze-determination

algorithm ................................. 190
Task T1: average squared error for each subject .............. 201
Task T2: button selection accuracy statistics ................ 222
Task T2: button selection timing statistics ................. 223
Task T3: button selection accuracy statistics ................ 245
Task T3: button selection timing statistics ................. 246
Sizes of typical buttons in the Netscape browser and mail windows, and a

typical news WWW page ......................... 267
Task T5: number of errors and timing statistics .............. 268

xii

u.

n";“ ..

L

‘

p

,w-~u.h 'ju"
1 43h.“ k J, .n
J .
I ‘ '
‘ v r
r \
J13. a. .
‘ Q
P.Qp " 3"..2
gnu-y It! _.
0 a
N
i 7 'V '1 - .
.I‘ r r g ‘
a... nu ”Us- |
.n I‘
II . . .' , '
\ 5
‘I In; .‘3. ..
" " Y" .
’._ I .h' .I‘ .
‘h ‘ -“ l u...”
I |
’.' U '.
.,' ~.. I. . .
‘h ‘ 0..
_" .
b ... '4 u
,\ 'g‘ '
.. 3‘ ‘u and.
y.
. l r
" m" Q- . .
“dd...“ A ' .O‘
I
Q l
.. l , \

f.

1‘; ‘V
. Q lv 9.
‘ u.‘
\v, .
‘ .,r '. ,
“uh“ ‘v.

1.1

1.2

2.1
2.2

2.3
2.4

3.1

3.2
3.3

3.4
3.5
3.6

3.7
3.8
3.9

LIST or FIGURES

Purkinje pupil tracker, an illustration (Courtesy of John M. Hen-
derson, Michigan State University Eyetracking Laboratory,
http://eyelab.msu.edu/) ........................

Gaze tracking system block diagram ....................

The illustration of the system described in Ohmura et al. [58] ......
The four Purkinje images are formed as reﬂections of incoming light rays
on the boundaries of the lens and cornea ................
Fitts’ law: analysis of the movement of user’s hand (from [16], page 52) .
Example of eye movement during face learning and recognition. (Courtesy
of John M. Henderson, Michigan State University Eyetracking Labo-
ratory, http : //eye1ab . msu . edu/) ...................

Skin color clusters: the horizontal axis is Rnom and the vertical axis is
Gum". Out of six clusters, three represent Skin color. Cluster la-
beled Face is the primary skin color, and clusters labeled Shadow_1
and Shadow_2 are areas of shaved beard and shadows around eyes. . .

Isolated face regions on sample images ...................

Skin color clusters in HSI color space: the horizontal axis is hue and the
vertical axis is saturation. Skin color is represented with three clusters.
Cluster labeled Face is the primary Skin color, and clusters labeled
Shadow_1 and Shadow-2 are the areas of shaved beard and shadows
around eyes .................................

Comparison of RGB and HSI models for skin color classiﬁcation .....

Erroneous face regions due to similar background and face colors.

Localization of dark skin color subjects: two failed and one successful]
classiﬁcation ................................

Red, green and blue intensity histograms for dark and light skin tones . .

Color-altered dark images, and resulting skin-color classiﬁcation .....

Red, green and blue components of two images: gradual thresholding in
search for eye blobs ............................

3-10 Gradual thresholding in search for eye pupils ................
3-11 Matching eye blobs with nose ........................
3-12 Results of eyes-nose detection for various roll, pan and tilt angles. The

white rectangle shows face boundaries, and the white dots show the
locations of the eyes, eyebrows and tip of the nose. ..........

3-13 Sample images used for accuracy of feature detection calculation .....

xiii

49

53
54

56
57
58

59
61
62

64

65
66

70

x

‘ \
’. '4 " \sl'".

 

 

' 3'» u.
i ”315..“-
I-I ‘ ‘
' I”.
H mummﬂ ll. ...
‘I ‘Llut
\|‘IOI.
‘u - , ‘
or .
linl'ti Iii “' i
.v'..\
. a l "".t""'.'.
'3 "5-"';ir g
‘I “Ah-lb .
n _
o
f ' ‘
"I.'"v‘ "I ”
3'4“ u I
' VI I
F ' ‘ | t"
‘I l‘ '
I‘ "y I; .I 33? I" "
{I l-l.
I o v
. ' ' ‘
“LIILA ‘II |. A
.I I
a» .I ~
.I
“(I .A-V It
I- - 4 -'. ’
[~10 ‘! ‘
KI4LAD ‘\ ~~sAI u
I
'1 ’- - v o .
.I \‘n I". II..IJ 3-
. ~WAn‘yg ‘
. , ._|
. , .
I". \ | “I .‘
$‘ﬁ- CI
. . .
.‘1 ' Dy. " .
do“: A.‘.|.....
It A -
l "K V'
- M-iMILvII.‘ u
.
.
h I 'J \
“\IQ.‘ ..
q ' O
.
.l . v ‘71\ . I.
.l.‘ ‘. “Ig'q‘. .‘ .
l
I.
H P.
.. .‘4. o \ 'I.' .\
‘5 tun-l . _
.. .
.. r. ,l‘ ' ..
4.| ‘ m I.'.L ' 7“
u. 6‘ ‘ 4
. . . 4... ' ._
‘ v47 T. I
I 'Jut’.’ “(Ln ‘1“ .
. f.
'l‘;.'.‘ £' ‘
rut“. \‘A(|‘
1 Y‘
I",.’ v
“W '.I
c. ‘ .
I“ 09 y... .
. a Y "4
H .I h... J u‘ I.-
fr 1.
' 99..
‘: ‘ r- ‘ V . A
...,.s.u.h ”A '
O., A
. ‘Q.
I ‘ll .“
’ . PI \ Y
I vuL. ‘v "o-.‘
C. I
t- . _'
.b-s- | ..
‘k‘ V A.
-
i" n ‘
'. i'TH "V 1 '
. Jx. 3“,] ‘.|‘.
9' ‘
3 \. l ‘
P ..
... .1... ,‘c “Q 4.,
2‘ ~.
I . _ -_ ‘
u. 5““ '1. Iv r)‘,
'i ."“ t,‘
.
.' U...'|
‘_‘\‘II"1
a
r P4
w .
.. ~.H~,‘“"‘: 4r
. unr‘ I).
-".
‘1 7‘.
'3'; ”2.3; ‘_ 444-
A‘:“\. ;_ . \b— Hi
I! q 4
d e"
4. ,1 . "v n
‘k“b.v ‘ v
. u ‘
"I
._ ‘I .
. “H.
.4 . '
“a. L- " V
M -.‘ : .,
IA :“
" it, .
" ‘er ,
. 4 4.,
a.“ \ c “
l‘ .
. \Y‘. .
I ' ,-
T‘Z,‘ I. ‘
O ‘1 .4.
r I
>Ph‘"l 3'
u‘ | “ 1.4
'I \'
" .iﬁ‘
6’1" ' y
. . ,.
I \'
-, \‘ _,
. _, _ .
“53‘4”. :Q
I
.i
7.
“L A - 'I.
.' ‘. d..
2 -
. 43“. T:
AI. ‘. ("V
t
i c
I.
8..“ 4
L» .
54. a y\‘
7‘ .
k .l \‘
f\
“L ‘ I. I?
3‘ If

4.1 Tracking System State Diagram .......................

4.2 Comparison of the X and Y coordinate tracking without (top) and with
(bottom) the use of Kalman ﬁlter for smoothing and prediction.

4.3 Sample time-space statistics for two movie ﬁles: X and Y coordinates of
feature points in time, and X vs.Y image coordinate ..........

5.1 Relations between the eyes and nose used as the input to the neural net-
work to determine the gaze direction. Dotted lines show the relations
we use as the ANN input, dashed lines show the relations that had
high weight in the resulting ANN .....................

5.2 Sample input data for ANN trained to determine gaze directionDashed
lines show the target grid points, rectangles represent face boundaries
and triangles represent the corresponding eyes and nose locations.

5.3 Comparison of gaze tracking results using the neural network only and
using the scaling of movements in the picture to display movements. .

5.4 Comparison of gaze point determined by movement scaling, Kalman ﬁlter
estimation, and weighted averaging ...................

5.5 F ixations measurement experiment setting and viewing angles ......

5.6 Fixation determination algorithm ......................

5.7 Gaze path and ﬁxations for a sample image. White line indicates the gaze
path. Gray dots are ﬁxation locations, the dot size indicates ﬁxation
duration. Black line connects consecutive ﬁxation locations. .....

5.8 Attention measurement experiment setting and viewing angles ......

5.9 Attention measurement calculation algorithm ...............

5.10 Gaze coordinate and viewing positions (TV vs. computer) over time for
three subjects 800, 801, $04 .......................

5.11 Gaze and eye coordinate changes magnitudes histograms for two subjects

5.12 Sample results of cursor control using gaze tracking ............

6.1 Selection by head motion—state diagram for “yes” and “no” motion de-

tection ...................................
6.2 Predeﬁned order of viewing the cards ....................
6.3 Paths to selected locations and YES/NO response paths .........
6.4 Sample mouth expression ..........................
6.5 Mouth expressions for a dark-skin person ..................
6-6 Mouth states transition diagram .......................
6.7 Snapshot of the face tracking demo GUI. Red dot shows where on the

screen the user is looking .........................
6-8 Netscape browser and mail windows with selected buttons highlighted . .
6.9 Netscape button sizes relative to the display size ..............

7.1 Task T 1: average squared error .......................
7-2 Task T1: cursor path for the best and worst squared error results in the

ﬁrst session ................................
7-3 Task T2: button selection accuracy and selection times ..........

xiv

76

82

87

92

93

99
100
102

103
104
106

107
109
114

119
120
120
122
122
123

125
128
129

145

147

I -'-- I" ‘ .l -.l I
I; T6351... ‘t‘t H

. .p _I ..
~_ 16:th 5.371 .

Ix.

{rm 3 £111 6 ‘l

., T _l_ T1) "‘ OysI
r ‘ ' ‘ ..
ll .L‘L‘h ‘.~ ‘ I
. Ivﬂl' WWI ""‘l'.
', 1&5 1.). Eu... '
"Iv-fly? '1‘ I“
‘U-al- I ‘ “
.. v v ‘V‘ ‘ .
. 11:51.1 w .
.- 7'“; 3 "H .
3 .21. -. ‘-

. 3|: .
lirld. - ."r (i ‘

5711113. l>:~'.'.‘

o

In - I Tq 1:) .' .
1 p - ~
A ,\
no A L l') n :1

I“ ‘ Y '

' I
'* ‘ ‘ "' .r
Dal. 11.1. r.’ ;

Ic' ' . u . .
I '.'w~.‘ - ‘r... ‘ .
DA , ~

u‘ ..u......:. .‘ ‘ 1. I

.

. . ‘ 7".

«A Jar? mini l
.

,A
._
n

l

. 1 '

'. '2‘ "H v I

17. Jar pa,“ .1.“
A

l N

n.1 1'," r. I 11

IJ u 7' J1. “ .‘h‘ ‘

H F‘ .. ,‘ .

‘¢ v .- "' . I t I

‘itr JG... 1.... .
I

'..—
1:.
J

l
'.“r y 1 l
4
~ Uun. dry;
.

l I

' ‘ v . O
. . A _| 7, v 4
A- J l'.‘ dd L4 6.1 ‘

‘ I
' 0

{7' ’ r . .
A.» vh" ”la.“ dr.“
‘

I ' :
7” . .,
...‘ ‘Jatr I, 7 y 1

fur. 6...”.

l .

' v'ﬁ'V‘rO'. _' V
a.‘ iJ‘Ir‘ I};

“"3on du‘.

* i
. _ -b
- m ll. .-
:. .. ' m. ‘
.. .Ql. l; \iL' ..
U
.
1" ‘r .
Il' .T', I
'1‘ .‘\I II,
“l“ 1“ .“J. '
P, ~ ‘ T
" r... 1,
4’. :‘N yup ,
““ ‘5. ‘4 ’I
.. .
‘ ~
-- . .1. ‘7'. .
8- ‘
1 MC}. 1‘ \‘A' ,

1“
.
I
a T ' ‘
\ ..
«at. 11. ‘J‘e‘.’
- . «
o.‘ '96.’ T1 I
. 1“.“ A :‘ir‘l '
" a
a .
17.-
. .M‘ I1 ‘ .
A‘“ ‘10
. ‘ ‘,'.'l
:I-~. ‘ J
,1, 9“,? .
_ .‘L Li. 1‘» ..
l"-> -'
a
l
”n. lq‘l' T'I "'1
AM i ‘ >'J./ ,.
‘ v
.’ f“ '
"- "i '
‘kd 1‘. \‘r P
‘- v
: I
"Id Q‘JT] I
‘Jl. ‘1. "no ._..I
v 4-
s".- v a
*mavT‘
‘51 ll \“7 v...
.4 'VA . ‘
UU¢1"' TQ -..L
. ‘~ 11.x“ ..
."5
D ‘1 '
.‘ A4"T «hi
' ‘4‘ l 4:; "‘
.‘"~ ‘ n
. .c; T1 0.},
‘1. " 1'41} '
.‘ ‘\ _
J I l. l T‘
". I" .
“A .
‘ A A“ \‘i‘
.1,‘

7.6
7.7

7.8
7.9

7.10

7.11
7.12

7.13

A.1
A.2
A.3
A4
A5
A6

Task T2: selection times vs. button distance by Fitts’ law ........ 148
Task T2: selection times vs.distance by Fitts’ law, and cursor paths for
trial 2 for a subject whose performance increased in the second session 151

Task T2: Distribution of the erroneous selections for each target button . 153

Task T3: number of correct selections by attempt and their timing, and
number of dragging steps ......................... 157

Task T3: selection times vs. distance by Fitts’ law ............ 160

Task T3: selection times vs.distance by F itts’ law and cursor paths for

trial 2 for a subject whose performance increased in the second session 162
Task T3: Distribution of the erroneous selections for each target locations

pair .................................... 165
Task T3: Distribution of the correct selections for each target locations pair166

Spatial distribution of the erroneous selections for each target locations

pair: darker lines represent more errors for that target locations pair. 167
Learning levels for 18 subjects ........................ 180
Gaze path and ﬁxation points, subj-2000 and subj—2001, images 0, 1, 191

2 .
Gaze path and ﬁxation points, subj-2002 and subj-2003, images 0, 1, 2 . 192
Gaze path and ﬁxation points, subj-2004 and subj-2005, images 0, 1, 2 . 193
Gaze path and ﬁxation points, subj-2006 and subj—2007, images 0, 1, 2 . 194
Gaze path and ﬁxation points, subj-2008 and subj-2009, images 0, 1, 2 . 195
Gaze path and ﬁxation points, subj-2010 and subj-2011, images 0, 1, 2 . 196

A.7 Gaze path and ﬁxation points, subj-2012 and subj-2013, images 0, 1, 2 . 197
A8 Gaze path and ﬁxation points, subj-2015 and subj-2016, images 0, 1, 2 . 198
A9 Gaze path and ﬁxation points, subj-2017 and subj-2018, rnages O, 1, 2 . . 199
B.1 Task T1, subject 00, cursor paths for all three curves ........... 202
8.2 Task T1, subject 01, cursor paths for all three curves ........... 203
3.3 Task T1, subject 02, cursor paths for all three curves ........... 204
BA Task T1, subject 03, cursor paths for all three curves ........... 205
B5 Task T1, subject 04, cursor paths for all three curves ........... 206
8.6 Task T1, subject 05, cursor paths for all three curves ........... 207
8.7 Task T1, subject 06, cursor paths for all three curves ........... 208
3.8 Task T1, subject 07, cursor paths for all three curves ........... 209
3.9 Task T1, subject 08, cursor paths for all three curves ........... 210
13.10 Task T1, subject 09, cursor paths for all three curves ........... 211
B.11 Task T1, subject 10, cursor paths for all three curves ........... 212
B.12 Task T1, subject 11, cursor paths for all three curves ........... 213
3.13 Task T1, subject 12, cursor paths for all three curves ........... 214
3.14 Task T1, subject 13, cursor paths for all three curves ........... 215
315 Task T1, subject 15, cursor paths for all three curves ........... 216
316 Task T1, subject 16, cursor paths for all three curves ........... 217
317 Task T1, subject 17, cursor paths for all three curves ........... 218
I318 Task T1, subject 18, cursor paths for all three curves ........... 219
319 Task T1, subject 20, cursor paths for all three curves ........... 220

XV

I
p.
.l.
a

n1.

£512.51:
Hill.

a: ‘1'
,a

|

”0?

1'-
-T“, su'.’ 7"

a

u;

.-
I
A

7,
It

‘
p .00
Uln'll

.l'l
.u

9'1

l' 1_‘~..
u -'-

f’dl

C.1 Task T2, subject 00, F itts’ law time-distance plots and cursor paths for

each button ................................

C.2 Task T2, subject 01, Fitts’ law time-distance plots and cursor paths for

each button ................................

C.3 Task T2, subject 02, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.4 Task T2, subject 03, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.5 Task T2, subject 04, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.6 Task T2, subject 05, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.7 Task T2, subject 06, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.8 Task T2, subject 07, Fitts’ law time—distance plots and cursor paths for
each button ................................

C.9 Task T2, subject 08, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.10 Task T2, subject 09, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.11 Task T2, subject 10, Fitts’ law time—distance plots and cursor paths for
each button ................................

C.12 Task T2, subject 11, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.13 Task T2, subject 12, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.14 Task T2, subject 13, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.15 Task T2, subject 15, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.16 Task T2, subject 16, Fitts’ law time-distance plots and cursor paths for
each button ................................

C.17 Task T2, subject 17, Fitts’ law time-distance plots and cursor paths for

each button . ................................

C.18 Task T2, subject 18, Fitts’ law time—distance plots and cursor paths for

each button ................................

C.19 Task T2, subject 20, F itts’ law time-distance plots and cursor paths for

each button ................................

D.1 Task T3, subject 00, Fitts’ law time-distance plots and cursor paths for
each target .................................
D.2 Task T3, subject 01, Fitts’ law time-distance plots and cursor paths for
each target .................................
D-3 Task T3, subject 02, Fitts’ law time—distance plots and cursor paths for
each target .................................

xvi

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

247

248

249

41. ..

\ |

(1‘ t.

i’

.»p .
. ..
‘ 4‘
I t.
..\.I.J .
~L ...
‘u
I!“ P
.\

. u “23‘ A'J.

- .
V n u
an .\.
o .
. ‘01:)
i'. s I IA
Jr)
.V... V..-h
«x
*1. ‘
l <-

D.4 Task T3, subject 03, Fitts’ law time-distance plots and cursor paths for

each target .................................

D.5 Task T3, subject 04, Fitts’ law time-distance plots and cursor paths for

each target .................................

D.6 Task T3, subject 05, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.7 Task T3, subject 06, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.8 Task T3, subject 07, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.9 Task T3, subject 08, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.10 Task T3, subject 09, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.11 Task T3, subject 10, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.12 Task T3, subject 11, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.13 Task T3, subject 12, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.14 Task T3, subject 13, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.15 Task T3, subject 15, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.16 Task T3, subject 16, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.17 Task T3, subject 17, Fitts’ law time-distance plots and cursor paths for
each target .................................

D.18 Task T3, subject 18, Fitts’ law time—distance plots and cursor paths for
each target .................................

D.19 Task T3, subject 20, Fitts’ law time-distance plots and cursor paths for
each target .................................

xvii

250

251

252

258

259

260

261

262

263

264

265

Chapter 1

roduct

 

u.“
l "'I.\ .

'1 '1
..‘.r _‘ F314,)“. “:1

“9“; AV" .
-; 9‘ V Ill 1' ‘1 .
«A no . k4.‘ ‘ I.
l V
"I «J
"-.'-’I -_r..., v 'v
... A ,M‘AA t...”
'lu‘t '1‘ l
K. 4,. I , n
.9... ,.
A. L‘AH‘WA‘
--._ .
h- ~ ‘
‘ _ ,‘vo, . 0}
~ ¢u . ’. .

 

7 Q
' t
fr ' _
k. N»- y . ‘
u, _I 1 w...
A .~
.y
' r.
k)‘. F\ TI

 

“'7“: h
t 0‘ ‘_ ~
I} A V v._
‘ ‘Aluk‘
o
V'u.‘
‘5‘,"
. '- .
4‘: .,
' a.” I? '
“.4“?
I.
”I...
"s

 

Chapter 1

Introduction

In this thesis, a computer vision based interface for Human-Computer Interaction
(HCI) is presented. The interface is based on the tracking of a human face and facial
features in 2D and offers handless, non-intrusive and inexpensive HCI in a. moderately
Controlled environment. We present the computer vision algorithms used to locate
and track a human face and facial features, we propose a gaze determination scheme
Used to control the cursor and HCI based on the interface, and, ﬁnally, we present

the results of an evaluation study of the usability and performance of the interface.

In the proposed interface, the human face is tracked and based on the face motion
the Cursor is moved. We do not determine the cursor position based on the absolute
head position, but rather the user is driving the cursor position with head movements.
AS With all novel interfaces, the use of our interface needs to be learned since most
users do not think of their head and eyes as an input interface, and the users need to

l . .
ear“ to perform controlled head movements to achleve the desrred cursor movements.

. App it;

1
in.
s-—-'" '» L

I. urn, U "ll!" 1, .
» ~u'otlu “Ara “H

"’t! l
‘ 4 .1 ,

'- \l v‘\ \\ w

u». .luu A l

l n

" r". o ,

v ' I‘ I ‘\ '

-. ...~ ..I ., M ”I

 

 

V.
I
‘A D
...,' V
J P.vd . I.
ﬁn- “*I‘ K_ A)"

A v-
i 'LMLI‘I ‘
o I
~.' ‘ H
'C
-. "yv ‘_. ‘
. “~ : v
“e“- C
Q
N,
- ’0
t. I”;
k V-
‘a ‘\
.._
ﬂ “
-\ “3r
‘~ ["7
‘ G ,I.
-"~4

 

'?
\a
u

1.1 Applications of Computer Vision

Computers are becoming part of our daily life, and interaction with them has become
a common and natural task for people. The ﬁeld of Human-Computer Interaction
(HCI) studies issues that have emerged with the use of computers, and how to make
the interaction as seamless as possible. Mainly, HCI deals with how to make better
mechanical input devices, how to organize displays, how to organize information and
dialogs with the user, and similar issues. Some effort has been made in using natural
means of communication (e.g., vision and /or speech) to enhance HCI. This was used
mainly for interfaces for handicapped users, and these input devices were cumbersome
and hard to use.
Recently, more attention has been devoted to the use of vision in the interaction
With computers. The goal is to be able to understand the user better and to offer a
more friendly and natural working environment. The use of vision offers the ability

to See the user’s face, understand the user’s mood, enable the user to look somewhere

and issue a command using a facial expression, etc.

1.2 Problem Statement and Motivation

Problem Statement: The goal of this research has been to develop a vision-based
HCI that observes the user and understands the user’s gaze and facial expressions
as a means of communication with the computer. The computer vision side of the
Work involves the general capability to detect and track a human face as it moves

In a 3D workspace. The use of such a system as an HCI would be twofold: (i) to

V . D V
,Iv'j‘ 1' l‘ ‘
V ILJIA‘I Ihi ‘ L
|

 

1
a

I’ﬂrl'h-HWI '3'
i. “QL‘IrdH “‘1 ‘ A“
.

 

. .. ._,.
3', ”flu.

' l '1
3' "V“. 'k‘ ‘l' ,‘I
1.2;...“ ml ~~ -- '
I
-. ,n v,‘ 0 r r " ‘ ‘
J. .hd A rt Y‘l -...
.

'1 ‘ Y .V
w’.‘ ‘Yl A‘.‘..rA.\ d‘ I-.I

"‘W'h -v y.

- ~ \ A, ‘ l
.1...u‘.1..uit A.
. , ‘

b.» , 3
,\\,v'r,«' “
H.“ -.uu.. “-8

‘V-v n,
‘1 i P I y .‘ ,
‘1. jar 1} 5,1 ."
a ‘ \ ‘ I
\ I “i“ ' ' c
m. ,(‘r‘ )f: I
u-
‘ \ \‘vI-V' I o‘
V
. -. I“. .L \ I) L ‘1
’I

 

, .
.g ,,_ L'.‘ \
«a, a ‘ .
-¥L.I I‘ “~
4G;
i‘ .
”Rn“. .
\ I'
I F"
‘u;‘ v
s .
\‘u I
i.‘
b
V'"
“L v-. , ‘ .
“.1“. ‘v
, ”H
K .
.0
b' 1.. V
" b'r '
‘..F v .

 

provide an input interface, and (ii) to enable an observation of user’s movements in a

3D workspace and an evaluation of how humans explore computer displays or virtual

environments.
Imagine the following scenario: user enters a room, walks to a computer display,

the computer recognizes who the user is and automatically opens the user’s workspace,

the user looks at an icon, signals with a facial expression to open it and the desired
program starts up.

This scenario has all the actions from today’s state-of—the-art systems, with a twist
that there is no username/ password typing, and no mouse pointing and clicking. All
this would be done through the use of computer vision systems. The key point in

such systems is that they should be non-intrusive and not require a special set-up for

the user.

1 - 2 . 1 Human-Computer Interaction

In the early days of computing, computers were “magical” machines, “don’t touch”
Systems. They were operated by a few trained people in white coats and kept in a
StriCt laboratory environment. There was not much room for any friendly interaction
With the machines, and these issues were not on the agenda at all. Even though people
Were developing artiﬁcial intelligence programs, these were not used to enhance the

lutefaction with the machine, but rather to create a machine that will reason and

understand scenes as humans do.

a" . R u .
Hr :El’ll‘tt‘i'lx it"

 

7""1r‘I'V‘
“ NV." I'uu .n.

 

‘
I
r

Ia-
.. null

«l. "4; ’ ',".‘ ll! :’

i «v v'
', oY‘n '1‘ ' s. \‘
. ,m‘r . u:
l ‘ l ,.
p ‘ ‘ ’ i
'7. ‘i' A . I A ‘
L mu; . - A.»

..‘ a. A

V
' N’s .

I
i‘.‘ v” _
3;: a, .l? .'.........

v». 9 1;. » .L‘ t:
a . .
-r‘ _A ‘h~u ¢Aau A A.
l A
v‘y-rv.“ ,‘ l 0‘ I. .
." h“, ‘l-l I.‘ 5.0 A b
.
V“ r. ' v
\ 1 r. r
>~ AM M” L ‘ .
l" -
l'i “' " 1. v
I i ll‘
..NJ‘ ,‘- A...
t
K. .' _‘ y.
, " 7'7"
,-' i...» l w"' 1
“V“... ..‘ ‘
.
Wyn _
' v
'7 v ‘ A 'I
“'f‘ .. ,“F \H. ‘
t
r '. , ’
.1,“ ‘ P y l‘ T“ ‘ H
‘ n~ ; _ .
. £ ‘
3pr i _n- w; .
A“ -’AA. ‘l l
‘ ‘.
-

 

‘ RV‘,‘ G, ‘r..
‘4 ;¢_‘.
A
n; ”'1
t .. (
“are"wv »
I ' ‘§‘ “
I
r
' l I l

'. .
‘i H."
. ‘u w ‘V'v‘0.
J an
‘. ‘71» '.
“av" \( 'V -' v
" ”.41"
._A
J' . 1 ,
‘4“ 4'L I .
a. Ipf,.' '
1.;“l

The increased use of personal computers (PCs) brought more personal interaction
and the machines were more accessible to the individual user. Since the users were not
Only trained experts, but also non-experts and even little childern, the interfaces had
to be made simpler and easier to use. Still, the users had to spend time in learning
how to use certain input devices. People had a lot of problems in coordinating mouse

usage at the beginning. Moreover, in a typical computer, two devices are needed for
input of data and the results are displayed on the third device: keyboard and mouse

for input, monitor for display. This leads to a divided attention of the user, and a

constant need to switch from one to another.

Ubiquitous computing [88, 87] tries to make computers unnoticeable and part of
the environment. However, the input is still done through writing on various pads.
Computers are smaller, and the communication with host computers is unnoticeable.

Still, there is no effort to observe the user using computer vision techniques.

Apart from ubiquitous computing, new input interfaces have been developed. The
use of voice recognition for input [85] is now commonplace. Gesture recognition is
also used as an input interface [64, 29, 20]. Finally, head and eye movements are used
as a replacement for the conventional mouse [48, 91, 75, 79, 2, 100, 73, 46, 81, 49,
6’ 3, 4, 1]. Many such systems were developed to be used primarily by handicapped
users [40, 49, 6, 3, 4, 1]. The advantage of handless HCI for such users is obvious.
kiost of those systems require special set-up and / or special hardware, which can be
both cumbersome and expensive. The use of infrared cameras is typical. However,
there are health concerns with their usage, such as whether infrared light beamed at

4

 

c'»;v’ '}'

“dunkih “H" ‘
n

I V .
w . 4* ~r' r‘
p ' ‘ [
-1 AM‘ “A .A-A
I l ‘
7. n o o . v
r ' r .
..... y .u .u .
' l N Y‘
tn 0 7 F
L ‘r ‘
.. ~ 4 .1.

 

PVV
I.

, «,..
““‘l ‘.\
.-

 

.
x [ v'.
..H ‘~.;‘ -
- .. .
\
,1
.
9—? _
. . P".
7
I A‘.

the eyes damages eye tissue. The use of various helmets might be rather unpleasant.

Mostly, the user’s movements are constrained to a very limited space.

Recently, more emphasis has been placed on developing a handless HCI for a
general population [75, 79, 2]. This work is directed toward developing a more user-
friendly interface that would involve interaction with computers that is very different
from the interaction we know today, and very similar to the scenario at the beginning
of the Section. Extensive use of computer vision algorithms would lead to healthier,
less cumbersome and cheaper systems that could have many practical uses. This kind
of interface would offer more freedom to the user—there would be no need to con-
stantly move one hand from keyboard to the mouse, thus all the attention is devoted
to the keyboard. Additionally, we would exploit the user’s natural behaviour—the

user looks where he or she wants to work and that should be naturally captured by
an HCI system.

All the above systems rely heavily on the use of various computer vision algo-
r ithms for analysis of faces. Individual applications like face recognition [82, 83, 78],
gESture recognition [26, 92], face-spotting in images [69, 59, 42, 96], gaze detection

and HCI [58, 10, 8], are all used to create the new HCI.

1 ~ 2 .2 Cognitive Aspects

In Cognitive science the ﬁeld of visual perception has been studied extensively. Vi—

sual perception is considered to have a key role in our cognition. By studying eye

 

[Image is presented in colon]

Figure 1.1: Purkinje pupil tracker, an illustration (Courtesy of John M. Henderson,
Michigan State University Eyetracking Laboratory, http : i/eyelab.msu.edu/)

movements, we can conclude which parts of a scene are viewed ﬁrst, to which parts

We devote more attention, what we notice and what we do not notice [38, 37].

To measure a subject’s eye movement, a “classic” Purkinje pupil tracker is used.
This device is very accurate and allows excellent measurements to be taken. The
P1" Oblem with it is that the measuring device is intrusive and very hard to use, and the
user does not behave naturally. During measurements, the user’s head is ﬁxed within
a. Wire frame and the user needs a bite bar to reduce head movements (see Figure 1.1
for illustration). Calibration of the system is needed before each measurement starts,
and Often also during the measurement session.

A great advantage would be to have a completely non-intrusive eye tracker that
would allow measurements to be taken while the user is naturally working in a 3D

6

£33,179. llu‘ ln'ml'ti'

lush-v
.

 

l : n . .~
33 .hl‘ \ml’.’ in.“ If A”
Lu. J“ ‘

l ‘ ' . . 1" v
‘p' 3". Ali {R 141'”) “I‘M ‘. . l
“r "

197w:

12.3 Computing (

' v
I' U‘ “y

‘,_ Inviuvy' v .

i .|.. -n v
M.“ ~.\ “1‘“!H‘“ ’l' ...
I
. ' n
l\r‘ ‘ ’:'.. y . Y .
«A ". l . . . _
“ “A“ H.‘ ‘ L ”O ‘
'P t' I! V
J ' P v
«.p ; l> ' _1 .
A r.,\.. :1. A“‘!“ II‘ in ‘
i _ q _ ‘
‘P "4".“ vy l Iv
. . l H 4 I .4 ..
.U.|" (l‘. l
u.~ n
. >yo.
‘J‘N «x:,
N
:I-‘WPrQ,¢‘- : '
5"“w|”h U a ';
-4, ' ‘
.2“ n..l,:
,. \. .
I at '

lzsvlp ,
a pp...

5"

 

workspace. The head-eye input interface described in this dissertation would facili-
tate this. More importantly, the interface could be “hidden” from the user and the
user would behave naturally, without the burden of knowing that an experiment is

underway.

1.2.3 Computing Challenges

From the computing point of view, the task of detecting and tracking a human face
has many challenges. The aim is not to analyze a general scene, but many aspects of
that problem need to be covered. Many forms and variations in human faces need to
be covered. Finally, all the algorithms need to work fast to keep pace with the user’s

movements.

Segmentation of a face: Distinguishing a face in an arbitrary scene is such an
easy task for humans. For a computer vision algorithm, though, this is not trivial.
Issues such as skin color, face orientation, and scale play a very important role in
complicating the process of face segmentation. Algorithms for face recognition usually
aISSume ﬁxed scale and orientation, and they deal with the problem of recognizing
the face from a grayscale image only [82, 62]. Algorithms for face spotting in an
arbitrary image try to deal with the above problems, and to locate a face in a single
image. Some algorithms use grayscale input images [69], while others work with color
images [96, 59, 42]. Problems that persist are ﬁnding dark faces and covering all the
POSSible face orientations. In all cases, the execution time of these algorithms is not

7

V
1 .

, .
5;" to {viii-Hill" ‘

iii
\ - , I
~ ‘ ii : [if] \\]\ ll] ‘1.
‘ “A" .Ui in. . . .
- . q . , .
v r-- .n _
"1""? Mail in" [ii '
u ' A
‘ ' '
.... n t 1,, ,
“Jr 1 ‘ .
[glue .3?!” i‘ “ .... u.
r in ‘ .1, , H
25»... i ‘u u' 5\ l. “:1.
r l . _I I
H ‘n' v t 4 o ‘
iizl¢l4 "g‘ MI, .. r

- . i i .
0.*~ Iv ‘Qv ‘vLH r~- v ‘
.. ,, .

.».u-.‘ l. Fiii.1.;i.~1'i. ‘\ |:.'

‘r-v‘ 9~‘:-~y4‘ ‘:
, ‘ '3‘ b ‘ ‘ '7 " '
._:.t. .L. arm: ., .t .1

l

“we 0 ‘-
_ ‘ P n“ ’I l iy v - V
“1’" mm “ALL. tildi

v. .
' v
I». ’

‘ p, _
\ ~ .
“blink im‘n ll ]5 is".

;r -...‘ l

- [l 1 °‘.V‘v_v .. ‘

I... ~31“ ‘ruip'ld f‘\ 9 I f 0
I ‘A. A

o

-_ “WK .1 iv. . ]‘

..' ‘1 ’i' .

““KW‘ olLi“ ‘I nrl‘ ‘
«‘4‘». g‘..“ ‘1

T
:‘V‘ “U' ' r A ‘ x
‘ u ‘ 7. O .
‘“ d‘H‘u“ .\ l‘
a
‘__.r .
. n. ... .1 f
t. q .
“A.“E EV“— ii ‘10 V.
u' N- .‘
ﬁ‘ M ‘ '-
3“".~ Iri’i- ,.
- hi...‘ dwi'u
.'_' ILL" . -.
“4.3-‘1'”? pk; _]
“‘7 Li 5]. [t h
.‘
Uf.v"rr‘
'i 41‘1"“ i. '7‘
| *‘Iv ‘
. , I
9.
:2“ r“- ‘
' Bikn i'rQ: 7'?
‘Q‘ A.‘ ‘ t v
.H
v. 7*
KY 3...

 

LrA/‘
»;
.‘ ‘44..
.~ [‘7 , .
a“: ‘U {”1
' I~G‘["‘ . ,
‘l(;.“
v. I
- ‘ L‘l Np."
«an; i.
5"] r.” '9
i “d.. 4'
9
aDI' ‘
'4“"‘
41H (I -
' “We,
t n‘,‘..,v,.‘. ‘
’Hk_

 

even close to real-time (e.g., 30 processed frames per second), thus making them not
suitable for analysis of live video.

To overcome this problem for analysis of live video streams, the most common
technique used is segmentation using a skin color model [59, 42, 96, 8]. A signiﬁcant
assumption here is that there is a face in the input image.

Finding selected face features: Locating the eyes, nose, and mouth in a face is
relatively straightforward to achieve once the face has been extracted. We can use
templates for desired features (e.g., average eye, nose and mouth), perform brute—force
template matching and obtain our feature locations. The major problem with this
approach is that it is very time-consuming, it would assume ﬁxed—scale input images,
and ﬁxed templates that might not cover all the variance in the appearances of the
features. Additionally, a change in lighting conditions could result in poor results.

By using a slightly different approach—not looking for each feature in isolation,
but ﬁnding these features together since they have ﬁxed natural relationships would
help greatly. In this work, the geometry of the face and various heuristics are exploited
to overcome the slow template-matching approach.

Determining user’s gaze: When we see a face, we can easily determine the person’s
gaze. Even infants can distinguish whether someone is looking them in the eyes or
not. For computer Vision algorithms, many assumptions need to be fulﬁlled in order
to Calculate the gaze. Pupil trackers based on infrared lighting use the Purkinje
reﬂections to calculate the gaze (Figure 2.2). As was discussed in Section 1.2.2, they
need extensive calibration. Another possibility is to calculate the gaze from face
features’ positions. Algorithms like Pose-From-Three-Points (P3P) [58, 10] proposed

8

 

 

 

+ wf‘jIITb‘. Till.‘ ("W

V
I _ l. .501]! [~‘
'iV’FT‘iliJ (”in ‘1‘”? " U ‘

i
. t [
‘ H
~ 0 v r, .
s mugs ,t, ...,.
.i‘ A. A‘ u‘ .
. . . . n
\.... 7"371 l ‘9v1'1V I v ..' h
‘l' “Uh Au ~~.A.J\AAA A |l
D . . .
“713: Us] I“ ‘6 4!
n '

.41.». y

' p
.....;;,.5, “t" lJX;1i"“ Hi]: to t

I" ‘ I ‘
7‘ i ‘7 h V .,
l‘ I l J [ ‘ ’
sl..,'f....,dl\\.|,? ”I“,
.

‘ ‘
7rr~~~ v- ~-
-gmiu Adi.‘.\1".... '
H...

L *1
ﬁ‘ 0‘ l") . 7v .0
- »L. m hi, [if I"“‘ .

’-

 

. ~. ‘ '1
at L;."“ -..,.
"l~'1“y\‘.-[‘]'d ‘. 9“],‘.r‘ [.1
‘~A.A _ 4‘“ ‘1.-
l
”‘3.“ wail": TI. .. ,
A AA.J~ ‘."I ..‘I If
' h
E a
‘\ 5‘” L.
A‘” .‘t L .V i

o ‘ '

‘~ if' r. ~
. w. 1‘.“ f’\\ L.»

 

 

 

 

a mathematical solution, however, we need to know the user’s dimensions and camera
characteristics. This could lead to problems with the portability of the system, and
the need to ask different users to enter their dimensions prior to the use of the system.
A solution to this problem is to use a non-linear approach based on an Artiﬁcial Neural
Network in conjunction with an approach used in the conventional mouse.

Recognizing face erpressions: A wide variety of facial expressions is present in
humans. We express our feelings through them and they are key attributes in human-
to-human interaction. For a computer system to recognize all possible expressions
would be impossible since it is hard to enumerate all possibilities and variations. Thus,
in an artiﬁcial system, we need to concentrate on a few expressions and work only
with them. The recognition can be based on neural networks [7], Markov models [59],
or nearest-neighbours classiﬁcation. In this work, an approach derived from our facial
features ﬁnding method is used, as will be described in Chapter 7.

Fast execution: An important issue in designing an input interface is ability to
have fast updates. The algorithms described above need to analyze a video stream,
at a rate of at least 15 frames per second. That leaves about 66 milliseconds on
average for processing of one image. Necessary operations are face segmentation and
facial feature ﬁnding. A segmentation algorithm is time consuming due to the need to
Classify each pixel and to use a connected components algorithm for the grouping of
pixels. Feature ﬁnding using template matching would be too expensive for the real-
time needs. A solution is to use either smaller images, thus reducing the necessary
Work, but also the accuracy of feature detection, or to use the most time consuming

algorithm sparsely.

1.3 Preposee

 

‘ '\‘
. ' '
v v . "v .. -L ‘
» "w ‘- l-
”I .‘jl.i(‘n......- u!
“I ' .
.

' ..2, [1m ..
""wyr. ' ; -
'.F...J..r.tz.i;'.1, l . ‘

~ ‘ ’ V I
.. » A "T 1' ‘ '
Ii “(L‘It‘ ‘11“ [h ~ x.

P | .

. . . ‘
¢,."4F ‘0 v.“ . “‘ .
bun-5‘" . ‘.

. . . '

H ' 'P' n o ‘

v ‘ . ‘

u. n. .J’ .ulf 1“ .‘ U.
Q

2

' v
A‘ ‘ ‘.

'1

.. e we a"...

 

I »
iil'..'.,[ .
N in l'l-('\
.J‘  
1'”: i‘lrl -‘
. ,A

- I
“A“ "' i.
. 5m”. I“.

‘i- H \

‘5

l

1.,
in...
.~ ~

‘;_f‘] 'v

m;

1.3 Proposed Gaze Tracking System Organization

In the remaining Chapters, the methods for face tracking, facial feature location, gaze
determination, HCI building and evaluation will be presented. They are all parts of
or based on the gaze tracking system we developed. Figure 1.2 depicts the block
diagram of the system. The input is the color image stream from a video camera.
Then, the face is located, and face coordinates are used to extract the face sub-image
and further analyze it and locate facial features. Currently located features are eyes,
eyebrows and nose. In addition, based on the eyes and nose location, we determine
whether the mouth is opened or closed. Finally, we transform the facial features
location and their movement into the gaze point coordinates on the screen. Gaze
point, as we will use this term, is deﬁned in Chapter 5.

The gaze coordinates can be used in several ways: determining ﬁxation points on
the screen and their duration, measuring user attention to the display, controlling
the cursor location and using the mouth state to send selection signals (clicks) to the

underlying windows interface.

1.4 Organization of the Dissertation

The remainder of this dissertation is organized as follows.

Chapter 2 reviews background information and related work. The work on face
segmentation and facial feature location is reviewed. The Kalman ﬁlter is introduced
and its applications are brieﬂy discussed. Several approaches to gaze direction de-
tection are discussed. The Purkinje pupil tracker is introduced and its application

10

CO'C'

33:2 TRACKING SYS’

 

—.Q
'31

 

 

 

 

 

Video camera

color image stream

—-----------—-—----—----—-—-—-—--------—--—q

: GAZE TRACKING SYSTEM

   
 

 

Face
Iocahzaﬁon

 

face boundaries

 

Facial features
locaﬂon

 

 

I
I
I
I
I
I
I
l
l
I
I
I
I
I
I
|
I
I
I
coordinates of eyes, :
eyebrows and nose Mouth state I
I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

l

I

I

 

determination

 

Gaze point

determination
mouth state:

opened or closed

 

gaze point: (x,y)
coord. normalized to O...1

  
 

 

 

V

Attention
measurement

   
   
 

Cursor control
& object selection

Fixation
determination

   
   

 

 

 

fixation locations time intervals of the screen cursor (x,y) coords.
on the screen and attention to the screen and selection commands
their duration and away of the screen (eg. like mouse clicks)

Figure 1.2: Gaze tracking system block diagram

11

'~ 'rr dim-um .].-: "Vi 1‘

 

i " '

1

- i vrr‘i ‘l “\"'I
twill“; did; in“ -.. 1'

I‘m 4
. .-A h-

} l
_ , .. 'i‘- f ‘nl‘
5. L...,»i..l'ii. cm“

1
9.: at: di~l"‘.\\l“l

p-
la

 

m .1 ' '

, I,. ‘7 4 ~rI. .

lif‘fllu‘ HI ') l‘f“l 5" H ‘ ‘1 \
I

, . a

|
".‘Nv' O - i: .I‘
.Jv..;..'d‘.4."§. 1" (IIA‘HH

.,., [-n 'Y‘I‘ ,. n . , ,
; - \r‘. ‘
.J‘. . hAaaroALlj\ i] d} li‘ i ' ' \

"W“VV'I f; ‘ '... .
¥;~T...Gu‘.ln [11"‘5 \ h

g"! s

. 'il l.;,. . f
IJ mu [hull It.

‘I.'._‘.‘,,_ .,.._ i
‘..h ....' V H ¥'-' ' ‘
s r W»! «h .1

. 5 r
.. '>_ ”i t .f ‘
x. .J‘ [“I :9. " ..'I
n. .
..-1 ,
:‘ II ‘b
\g-a l . "'
. “VJ. :[p\_ll ,\5. '
. .
. A

‘ '1.
I“
9-,. .
A r‘ ‘
“at. t’ 4 l '
f " 1i;t\l"“.g.\ l
‘ A.1_
.4
r.; .
L [N‘h-
a.“ '.‘ 'Rhio. '
“A .I.A’J,,:\ ]v
“" i.’

. .w
s; " '
' -. .1; us
.. D v.
. H 94"} 9 ]
‘
.. q .Q;
r- . t .,
x .H [Jr V), Ni“-
.. ‘ L .
.i' o i
q‘ f‘-
“. “ﬁalg :'v.. V
_ ‘J \Fr'f‘ 7'
A A‘ ‘.
l"
.4333}? T
l‘ .I J ,
U.“\l-' s
, \
."fTJ‘"
h“§.('“}.b\‘ '
4:
‘ ”192m '
V
- [1“]:
I
12‘ I l
't.‘ ' .
ZJ‘ '
‘G' ,
. m I“
. 5] v

to gaze direction detection are discussed. Several commercial eye-mouse systems are
reviewed and discussed. Finally, HCI-related issues in the creation of user interfaces
are introduced, and ﬁndings of several studies that compare conventional and eye
mouse are discussed.

Chapter 3 describes a skin color model for face detection and a method for locating
facial features. Its advantages and drawbacks are discussed. A number of successful
face segmentations is presented, as well as a number of unsuccessful and problematic
segmentation cases. A method for detecting a dark face and enhancing the colors
so that the skin color model can be used is presented. Finally, some suggestions on
how to overcome the drawbacks are made. Once the face is located, a method for
locating eyes, eyebrows and nose on a face is presented. The method is based on the
knowledge of the geometry of the face and uses several heuristics. The parameters for
evaluation of eyes-nose matches are introduced. Finally, evaluations of the accuracy
of the feature location are presented.

Chapter 4 describes how the features are tracked in time and discusses the accuracy
and timing results. The system state diagram is presented and it is discussed when
certain parts of the overall algorithm are executed. The usage of a motion detection
algorithm is presented and it is discussed how it can be used to overcome some
DFOblems in the tracking. It is described how the Kalman filter is used to smooth out
the tracking. Several results of successful smoothing using the ﬁlter are presented.

Chapter 5 presents a method for detecting user’s gaze direction based on the
COordinates of eyes and nose in the image. A neural network used to map user’s facial
feature location to screen coordinates is described, as well as a method used for its

12

 

I ' '
' . l 1: v v [11; _
'Y."‘.'L illf null-\[iwhx
"ain‘t-'3‘ A

L.

I
] ~. i9< 'f | 9!
‘~'Ir.’“§il..5 U ‘l..'"" .

l

r

in gn. ‘v‘. '1‘ l '0‘
“if in.“ iii. A“, nlii ‘
n

c
I] ‘
- in »n ‘r' ] r , ., ,
‘> ] . a‘ \
Qr‘rlcl Punt.“ ~ li‘ .' ‘,

[m [a J

P 7'] 74"; 1' O

‘t-l1,.'.f_',l"\t,“\l‘l ,
‘ s

 

 

 

 

' r .
"‘ " 0 o ,
vw 'l r. ._ .
.A 4....f‘lﬂjf‘ [5 1]] [1‘] m.
I

a.

r . .. - :
I . . . . 4 ‘
“my-ff . \‘L5i ‘il“' -‘ ‘.

. 'V‘Nv . "F ‘
“4 y" g “1‘ W' 7"“, .
. - ‘N‘. ‘ it \

F]. "1!" ”Apr: "r
' W ‘ I
i :J- A““‘a(:‘f‘. lI‘s
b

55:97?!

T

F
)2)-
> ﬂ
0'»

r.

.

"A

"e 1.,“ ib‘ ‘[
4'
P',
" t" V’Q_.
”*- l’w . e ‘
“~ L5H} I i, r
.u. ‘u‘ k

v. .
i "r I
.. ,,. '
‘ ‘1‘“? ("v ..I
‘ ' - .‘JH. _ .
c g A '
u;'
‘1 PW"? .
..A A‘ff {117;‘;.,. h .
.3. If] \
' C

training. The mapping of motion of features in the image to the motion of the screen
cursor is described. The joint use of the above two mappings is discussed, as well
as the results of smoothing of the output, which provides a framework for further
development. The gaze tracking is applied to ﬁxation determination and attention
measurement. The results of preliminary studies are presented.

Chapter 6 presents development of an HCI based on the head-eye input interface
and gaze tracking. Ways to make a. handless selection are discussed along with some
issues related to GUI design. Finally, the evaluation procedure for a head-eye input
based interface is preposed.

Chapter 7 discusses several applications of the system described in this dissertation
and presents the results of an evaluation and usability study of our proposed head-
eye input interface. Tasks such as moving the cursor along a deﬁned path, button

selection, dragging of icons, and the use of general applications (e.g., Netscape and
mail) were tested. The results for two user sessions are compared and some concluding
remarks on the GUI design and interface usability and learnability are made.
Chapter 8 concludes the dissertation with a summary of the major contributions

and future directions towards the improvement of the preposed interface.

13

Chapter 2

Backgroul‘

' I ‘
n'hv" V
. -r ”-7 " " "
‘ J.
u.-i~ Lllu,- “A. \‘1*‘ “
.
' F P
,‘-;-.y« o v- ' '
[I] “r $.le \ ‘. _I‘a ’
I

. I ~

”LN-p “at“? or v' ‘

“in-Kw. D. .d‘. .04 P)-bi‘. “
A

, 2-, r. ‘
h . , .

{L1 «at. 1C" ' (1:.‘i
..
_ ' ' ‘
. "M l I ' ‘

. ]‘ ‘ “ . y. '-

. a .
u... ‘R I; h (‘5

.w -
’ ‘. 5v .

v . . TI Ff)“ ' ‘ ‘
. V‘l" ‘ V y r ‘
. Ltwfdiii “

‘ U

‘u
“v.4. °
".4 :‘ A..-
k”, .3! f’\ ‘n ' T. ‘
» ‘ COX (1" ‘I
,
.\.. ‘
i‘
\ h, '1.“ W‘ a]
4 I
\J _u" (’4 l'. ‘
~0q..=
LEN-Jr
‘4“ :D‘;9;‘
‘4 l i ’N ' V i
.L '
I. drd. ‘-
la
1;".-
"1‘." :J'tv-
. * u.) '{V‘ ‘
, U. Jeff

Chapter 2

Background and Related Work

In this Chapter, various background materials are discussed. The ﬁrst three Sections
review the topics of face localization in images, locating facial features, and discrimi-
nation of facial expressions. In Section 2.4, gaze direction determination is discussed.
Section 2.5 introduces the Kalman ﬁlter and reviews the tepic of tracking of objects,
in particular, faces and facial features. Human-Computer Interaction (HCI) issues

relevant to this work are presented in Section 2.6. Finally, in Section 2.7, visual

perception research is discussed.

2.1 Face Localization

Finding faces in an arbitrary image has been studied extensively. The problem is very
difﬁcult due to the complexity of the human face, within-class variations in faces, and
possible rotations and size of the face. To cover all races, skin colors and shapes in
a single approach is very hard. In this Section we review three paths that can be

14

p‘:

..|
.n o n! ,1. i l
du>uufdg.

I g I
i ' " ’ a ‘ w
3;an 3.10;" at I. dim l...

2.1.1 C lusterinv.

 

 

 

¢~r~ cw . . ‘.
~\n ,q,uo,‘r..f. .

.. «u. ,._ .

l l
.,... .,..[]..
. ~ .“u. ‘L itch“ . ”I “'1.

. 1 Y ‘
"vw ..
‘ . ' -1 ‘ v ,
and}? basin, 5“ ,"‘
‘ . H
\l‘."' ~11 1 [ll . ‘ —-
.‘...'T Lin" 1 “. I I
I ,
"y‘r? -. ‘ .2 .
I' «ﬁt .f’if‘“ [Lt it 1“ o. v
‘ Q
"raj”; I
:'\I. ‘1" ..
.- “ﬁx“? K.‘ 1' .II 1?.
-.._‘ . .
-'.“\ ,r. , ]
‘~ “ *d‘h r!;~'.r
1“ ‘i-wlg _
~,_ m, . '
..,nn,a;r P1..,_'
' ‘ ..\ '
. ’r
,3 ‘ .,, .
""H, art 1'. ‘1' --
' I A.l‘A ]
- ‘1 ‘df" ‘ l-
];N““' i
k «WV '1’ (3,,” ‘n
4 I. ~.cQ‘>‘rV T)! I.' .
\
. I .H
'i-(rr.
~.. H], _
“u3r€: :1»:r.7’ 1
“1 ,yv fr
In
., ]
\‘e.
s. h- |‘-_
' V
. “[7] G e] , ‘
‘As lei“ (3“?
.. I‘m-«f
a]
\* ; ‘ .
“in 3’ ‘a
a ] r v- .
‘ . i“ ‘,
.
5...
s.“ on!
.‘k M .
“" lf‘TY-r, .
.‘Akjtﬂ .A\ u
g
. x.
N ~Id .]
,.. - ‘
ut le.
‘-’11"c t
“~~ I) O
.0
GEHv , .
A“, QPF’I
-]4 i?‘
a LN: /
Hi
I .
‘1...
a’Ja._
.‘ L1,] ;"
..,r]
H‘. ' ',
r14..‘(
9,}
:' L» i
\_.P -
«~sz
‘4 I I
J V
lka‘ .P '

followed: clustering, PCA-, and rules-based approaches, artificial neural network-

based approach and skin color-based approach.

2.1.1 Clustering, PCA and Rule Based Approaches

In this Section we describe several face location systems based on “traditional” pattern

recognition and AI techniques like clustering, principal components analysis (PCA),

and rule-based systems.

Sung and Peggio [77] designed a face location system based on clustering. In
the ﬁrst level, they determine whether a face exists in each subwindow by classifying
the subwindows into one of six face or six non-face clusters. They use two distance
metrics to each cluster: “partial” distance between test pattern and the cluster’s 75
most signiﬁcant eigenvectors, and Euclidean distance between the test pattern and
its projection in the 75-dimensional subspace. In the second level, a neural network
is used to classify points using the two distances to each of the clusters. They used 10
times more negative than positive face examples, collected in a “bootstrap” manner,

thus reducing the explicit number of negative examples.

Sakai et al. [72] proposed a system based on line extraction and matching lines to
predeﬁned templates. They had templates for the face, eyes, nose and mouth, and
allowed the scaling of templates to be able to locate faces of unknown sizes.

Govindaraju et al. [33] proposed a method for face location based on face boundary
detection and semantic networks. From an edge-image they locate curves using the
generalized Hough transform. Curves that are likely to be face sides, hair line, or

15

;,'.,. Fm, an) ElliIIPHi ‘1‘

,,‘..I ml.

-’,3 ‘ "1‘ ,- - .0 ‘
h" ennui d >uu....:

..
'l

. . , .. ,
{£12 f'ftili “flair fun .I'.

.§“T* 7 " t“. .fli, I “1,1,. .

".InAh

J... . "', r
rTTI'rf.I.'a.lll"> i..' u:

' 0
\.'If '21:} HHj' .«v I" h
us...m .4 Ant ‘1‘ H .3 h

I]? r““"‘ f. \

Rt ‘.

.' ' .
‘. H, 4.
.a... Mimi. {,1

 

’Fi'Mu‘MVr .;. z
.. ...-...ul..l_)ul \\lui"".l,\

,
'wdii‘ﬂnftlr 7‘ — .
' 4‘. ,: )
.~ 31] Hf [ﬂ '
' I‘}\I

 

' ‘
”‘4 ”is . i I
~u I ~.
. Lrif‘f‘ ti] 7“.”

f?“ X".\-
"K H
- \
‘ ‘ , ‘1'“; . ‘
" "‘l'if‘r
‘ ,I
‘-.‘.."
",3 “r“ “J
' h. .| ‘.
.zid.‘ \lll'YP _ ' ‘
I "A!
.; ”‘-
4 mil I]; . ‘
a y‘i]§ TV; “ ,
T "' ‘.I. - O_
.t. "f”. ‘.'
A “:l
\ ‘e.
« ~v-v.
Inf /]-‘,~V;T;
Hui (I Y ,
A.
a. (A
[1
U1... r
‘. rJ
du'i p: 1‘ .
“carp; 1. ..
. I"
- J ,
‘9'. .
I: (“K‘
‘7“
§.] ? 1
4.1:] T,“ .
i ‘ '
'I.;|"
. d
:'l;\\
“‘1 .
N-Cfd' r
v‘1(o\,.. m
I 1 L‘ \"|‘
i‘ d.

chin line are grouped using weighted voting. In a follow-up work, Govindaraju et
al. [34] applied a similar method. They found the edge—image and used thinning to
obtain candidate curvature lines. The lines were grouped in a sparse graph, where
the weight of the connections between the nodes represented the degree of similarity

between features. The face was found as the biggest clique in the graph.

Yang and Huang [93] manually coded rules and feature detectors for face detection.
In the ﬁrst phase, simple rules are applied to the high-resolution windows in order
to ﬁnd candidate regions. In the second phase, more complex rules are used in the
lower-resolution windows. Finally, in the third phase, edge-based features are used

to classify a full-resolution window as face or non-face.

2.1.2 Artiﬁcial Neural Network Based Approach

The introduction of Rosenblatt’s perceptron [68] in late 1950’s resulted in a lot of
research that tried to mimic human visual perception through Artiﬁcial Neural Net-
works (ANN). However, Minsky and Papert’s proof [52], in the late 1960’s, that per-
ceptrons could solve only limited problems, resulted in a pause in the use of ANNs. In
the mid 1980’s, the backprOpagation training algorithm [71] revived the use of ANNS

in Pattern recognition and artiﬁcial intelligence (AI) problems.

Burel and Carel [15] designed a 2-layer neural network that classiﬁed input as face
or background. The input to the network was a 15 x 20 window, and the output was
a ClaSSiﬁcation result. Subwindows of an input image were presented at various scales

16

t'.L'
.,

'ﬁ“

» w‘p

Ill
uo—o

11:30.7de KM" T

f

1
~- u azzzr 1m ' -‘ I .

”I {F Yl‘ I

.A \t t.

h

v . 1- ‘
_ . mahlf; a'i d

I n
7 \ .
In t

f
, ‘ ‘
[0..1.“ ' t

‘1;

v . ,'
“Ill H‘
v < .[.

aal.‘ 1“! (I

l
"" r , l. p .
.J ".- ,2? 7‘ in .‘
t u L. ‘ .. a ‘
rir r ‘
'5» (3. { ‘.
‘ “3 J1 l.‘ luxl'
.. 3.13;, . .tn '7-
444 _‘ d‘ ‘1’“! :
‘
b .
"1"”; "1"».
. ...
51‘4 .
' ‘ «Mum. V.
v. Q.
‘>~
-\‘_‘v.
““34

. r.“
Giiglhfa? '1'. h. ‘
‘ u, A. lvu.
A“
x
'p, P
‘s' fl”...
'9 Ik ra. .
r d”‘\‘, I» .
,‘ r‘
75.7. _ ‘
v 'n.
‘3“ “in, .
«1‘..}“ Li a r
it
.l‘ ‘
,._
‘ in: .L; r
~ l4 ,
l‘“ “If“:‘T' v
9‘ :‘
‘o', ,
.3 .
4....
a” :3 ‘n v'.
t
{UF’ ._
‘ lir,‘
‘ A
I
.._ ‘
_.
V‘ If? 9.. r
‘ “l..‘ri.’
‘ i

to recognize faces of different sizes. Training and testing data were obtained from the
video recording of a conference room.

Vaillant et al. [84] designed a system that used two layers of networks. The ﬁrst
layer produces a response for face images, while the second layer reacts to centered
face images only, thus locating the face more precisely. One possible problem with
the system is that it was trained mainly with positive face examples, and only a small
number of non-face images was used.

Important results were obtained by Rowley et al. [69] at CMU. They achieved
80—90% accuracy in face location using ANNS. Their network was trained to classify
a 20 X 20 window as face or non-face. Each 20 x 20 subwindow is presented to the
network, at the original scale and at reduced scales. A key point in the training of
the network is to add non-face examples as the training progresses [77]. This reduces
the size of the training data set, which can grow very quickly if negative examples
are added at will. The network is trained to accept a certain degree of head rotation.
The execution time of the program is not suitable for real-time applications since it is
about 24 seconds per 320 x 240 input image. In their follow-up work, Rowley et al. [70]
designed a rotation-invariant face detection system. The ﬁrst stage of their system
is a router network that detects whether the input window represents a rotated face.
The router network assumes that the candidate window is a face, and whether the
inPUt window is a face or not is resolved at a later stage of processing. If the face is
rotated, the network detects the rotation angle, and the window is de-rotated so that
the face is in the upright position, and the network from [69] is used to classify it as
face 0T non-face. The accuracy of the new system is 79.6%, and the processing time

17

[7' at

in

m 17H

,_

,. c—AU

L‘

. 18.5: r. 12h 111;]:

{Hz li‘.

., f "N.
{'n‘) of (1.. .1 l

. :.1 ”,i ,
.'.‘|1‘:.Indn Ill 'Jithl u

I]!

ihl.

2.1.3 Skin Colt

-

.u‘

. . ‘
>,h fr rvy [I1], ".1 ‘v ..
t “42““ t]. _ k.1 .‘i ‘\

A

, ,' ’l ‘-

‘. .m 3‘, .,, ,,
..... ".4. . A. 1.” (A...
' D
.. . _, l_ ‘ T
7r 4f 5. '1‘ '

-~. ma. rutait'dﬁ. ll.
i’ I
y 5.. .5
~ «.«h. In aw
--<‘W v
Q‘ig4\‘r.' -l.' '
u.-» '
4 4 hi" Yuri my '
I
“tonal [‘ I: l"
1.. ,.. (1 5]“! t I
i
..-
. I."
v a). a“? I'
1.
(I? ”I,
. 'I,‘ - ‘ I
«tram... ll"
'44.
A.
,"r.
h J? I: ~] ‘
W-..,. 1,.. ,
"‘ A I \ :
;; .
_.‘ No. '
\l ‘
.. .~\ - 7 I
"‘ * 1H 'Jll“ f "
.14"
'~."\ ‘7" '
7‘, .a “v
-M 11, iV-rv ‘
. w ... f
- 4]).“_ :6 .,
..
3“ .- ‘
.‘ r _ -\
3 K? «1:64 ll]. ‘ .
i “P ‘ :‘l f

I

T‘—

u,’ 1?".

-.i.' r . .
5 M : ,.]
v‘z~ \rdl‘E’v .7
a“; \‘ ‘
e- 1. .
3 .‘
.‘ Ljr I" ,
.Q‘ r
IA

for an 160 x 120 input image is about 6 seconds, measured on an SGI O2 workstation

with an 174 MHz RIOOOO processor.

Osuna et al. [60] trained a support vector machine (SVM) in a similar fashion as
Rowley, and achieved comparable results. Their system provides new way of training

polynomial, neural network or Radial Basis classiﬁers in a much faster manner.

2.1.3 Skin Color Based Approach

The main problem with approaches described in Sections 2.1.1 and 2.1.2 is the execu—
tion time. All the algorithms need to inspect many subimages in order to determine
the face locations. In addition, these algorithms all work with grayscale images as
the input. In the case of color images of faces, one important piece of information is
added—the skin color. In the late 1990’s, tracking of faces in video streams is done

mainly using a skin color model.

In 1996, Yang and Waibel [96] presented a system for face tracking based on a
skin color model. Human skin can be clustered into easily discriminable clusters, that
can be modeled by one or more Gaussian distributions. A wide variety of skin colors
easily ﬁts in one cluster. The method for tracking is based on the classiﬁcation of
pixelS, grouping them into connected components or regions, and determining which
1regions are likely to be face. These results are further used by Stiefelhagen et al. [75]
for a gaze tracking system, Oliver et al. [59] for tracking and facial expressions, Jebara
8t al. [42] for tracking and modeling, and Bala et al. [9] for face and eye tracking, to

18

reziu‘ri [115] a M." I"

'3 'g" ‘.
Ll” Ill lad...» 1.,

‘
.[rn
‘u

~ln ‘ ‘ ,I ll
\[ml TLD U « I "

‘ ..' I“ cl 9‘
1:».2wilxbl3- 3

 

1' red. The no .

l

v'. ."r '1‘ v '. ‘.
.i['l.f."I:L;(11rlr.'. .'

V t
l

L". Hewitt \

, .
.‘o --~;yyn1 (.L' V? ., ’
~~ mu . u

tﬁtd‘. .a....l1..

,.,_l, . .' ‘ ' ‘
arr: with rich; al.:

. I . .
._ . 1 .

y " ”r '4 ‘ F
‘ \1 . y o , '_
... _l¢AL‘.d‘ \IJ 2‘ 9

1w
Al:
O.

""ﬂr-I ‘.
'~ “ ”T ‘|.‘
“mac my a: '
‘ A.

O

:31; m‘ .. ..

* my ht} Id/llﬁ'w '
A-|‘ .“

”.5.

~"ll,",‘["v:_ .
“‘ J'" 3'
"l ‘Q 15'}? , >
I‘lldr.'\
29.,»
h" 5"“. I
‘ " ‘.>- ,
a. ,
.quf1\hl,n
I G|“

mention just a few. In all these systems, real-time rates are achieved using low-end
computers, in many cases PCs.
Most skin color models are based on either RGB or HSI chrominance models. In

the case of RGB, the plot of normalized red and green components (Rnorm = —R+g+B

vs. Guam = 72%) is used, and in the HSI model, hue and saturation components
are used. The model can be adapted to a particular skin, and Yang and Waibel [94]
discussed this issue.

A potential problem with a skin color model is that dark skin is not well covered
by the model. This was noted in Terrillon and Akamatsu [80], who did a study
of different chrominance models applied to the skin color clustering problem. The
problem with dark skin is that low values of R, G and B lead to large ﬂuctuations in
their relative values, which results in significant noise.

Another problem with all skin color models is that background pixels might be of
color similar to the face color. For example, the walls could be beige, which is similar
to Caucasian skin, or the subject could be wearng clothes that are of skin-like color.
In such cases, a face detection algorithm may not be able to discriminate easily the
face and the adjacent regions, and more processing may be needed to analyze the

regions.

2.2 Facial Feature Finding

Locating facial features in the face images has been studied extensively. A wide vari-

ety 0f methods have been applied, and all of them have advantages and disadvantages

19

‘ ‘ ~ or. _', ‘ \
cm {be m u'. m

, . L"; l;[..‘ 'Q'VI . l.
l ‘HM l ...,...

a 1‘ r
.1.;..‘il..l..L.

|
-uurTﬂ, y‘vlv'.l...l
L1; Cm; Ll‘l'lul .II\ u.

.u

[“2" ‘v- .1» .
' .-“|n.>[)t.‘ nil t

.,‘
Lama

.y'

, .
51’ "3 ‘f 1 O-wv
L1,][a'hiont I’lau‘uu".

m :3}. i y ..
h [v "[“*""d (1.. ‘. U '.

.. ' 1
W V; l . , .,
I! .| A

,so. .‘ [Ekf\.“ ;‘

f.‘

. r -
l.| 1" ‘
. .1 .‘.f_]‘ m“V‘* 'v
A..
o

T 9 ‘
“‘5‘ CW TIMI; .~:. .

J
.
l
"W“
-‘ _» .
>r,J _
4.”td‘.li““‘Vv "
. u...‘,“
g ‘9

.29.
.g .L. ‘
7“ ‘3. a ,4 .
In; (1.11
‘ L‘.
~. v
4.5 >
‘u;«' f’
A 1“} n'w .‘
C '1“
" ;.,
”‘9 1‘ ,
H Z’-“'Nr. :
a y.
I):

(mainly the execution times were rather high for some real-time applications). The
algorithms like template matching, applying an eigen-feature approach, or match-
ing using geometrical constraints, could be applied on typical still images. More
application-speciﬁc approaches are based on detecting infrared reﬂections of the eyes,
or detecting blinking in video stream, or in modeling mouth color in video stream
data.

Yuille et al. [99] ﬁnd eyes and nose in a face image using deformable templates.
They deﬁned an eye template as a combination of a dark circle for iris, two parabolic
sections of the eyelids for the boundary, and white regions of eyes. The preprocessing
stage of their system includes morphological ﬁlters to ﬁnd peaks and valleys in the
image. They then deform the templates to minimize the energy function.

Pentland et al. [62] proposed a method for face recognition based on finding facial
features and comparing them to the database. They deﬁned eigeneyes, eigennose and
eigenmouth and were able to successfully locate eyes, nose and mouth using template
matching with eigen-features. The locations of eyes were easily found, while nose and
mouth locations had higher variation. Overall, the erroneous matches were deﬁned
as ones more than 5 pixels away from the true location, and the recognition rate was
94% with 6% false alarms rate.

Talmi and Liu [79] used a PCA-based approach to ﬁnd eyes. They created
eigeneyes, and did subwindow search for the eyes. Additionally, they used stereo
matching to ﬁnd the precise eye position in 3D space, so that they could focus a
Camera to zoom in on the eye.

Kwon and da Vitoria Lobo [50] use snakelets to locate eyes, nose and mouth.

20

. v .
\{w an 11 ‘ .

01‘
I?
a

[pH
0!

i‘..

gt

"'3 fﬂ'flgt‘fl 5'

[31.) an ere-reﬁt- l.

, .
12w 71:? al. ll' :6.

I

my ~13 7"

u RAH}: y.

'hr’ I Vv
A

.J.

' c
‘4‘ .
... I all‘ . 'A u

7"

._;-« Ii. .1

\

‘an “ “ . |
.‘ " - " V~§
wuihkﬁ (1 ‘I‘i‘l
a
['li, . .
l'x' V ‘ ,
"lﬂf‘l ”I :2]
;-.n . ‘ ’V ,
4..“ Lack 1'? 7.
“* LA.
Y, I
] v.

P-n.
V‘ .
‘1"-U.(L 'T‘v h,
~-c~l.:1|‘ ~,
.,‘A F
\ .
‘ I‘VE." , ,
J“; a f‘
Al
l
1 7;»- .
V“! -.?‘v‘
“an:
Xi.»
‘ a. 31:1
.ﬁlql .,V c
‘o
,- ,
:4 ,‘P ‘
[Au-“'1 8' n
.‘l
“ A
WI
“‘6. [3.1
‘ .‘,.‘[_J.
g ..
‘ n I \" -
)3, :‘I
§ 1
‘7' t 'v
,‘ .
l H“: r.‘, '1
’L.
8.;
K \ 5‘
\ .

‘4.
'11‘4. ,
\ ‘E O? f
x‘» I»

Stiefelhagen et al. [75] use geometrical constraints to locate eyes, nostrils and
mouth corners. Starting from a region that is likely to be a face, they separate it.
into an eye-region and a mouth—region. Assuming a near-frontal view of the face,
they use an iterative thresholding in the eye region to ﬁnd eye pupils; horizontal
integral projection in the mouth region to ﬁnd mouth location; and ﬁnally, iterative

thresholding for nostrils.

Bala et al. [9] ﬁnd the eyes in a video stream based on blinking. Humans in-
voluntarily blink to moisten the eyes, and the authors analyze luminance differences
of successive images, which indicate motion. The eyes are located by tracking a

luminance-adapted block matching technique.

Oliver et al. [59] model mouth color similarly to the skin color modeling, and ﬁnd

and track the mouth based on a classiﬁcation into mouth color and connectivity.

Jebara et al. [42] ﬁnd eyes in a face image by computing dark symmetry transfor-
mation, computed from a phase and edge map by wave propagation. Mouth detection
is done from a dark axial symmetry map, where the longest limb is selected as mouth.
A coarse estimate of nose’s vertical location is done by searching for the strongest

vertical gradient in an intensity image bounded by the eyes and mouth.

Ballard and Stockman [10] used two methods to ﬁnd eyes and nose. The ﬁrst
method uses deformable templates for the eye, similarly to ones described in [99]. To
Speed Up the computation, simple ellipsoidal templates are used. The second method
ﬁnds pupils and nose from light reﬂection. In their experiment, light is projected
toWards the user’s face, and pupils can be easily found by ﬁnding bright reflections

21

I 11'

. V ‘
.«vyt '[lnNinjltli‘L‘
23.2: ‘1‘ ‘

v ‘ ' ,[, I.
1‘"; “8.531 Inf in" ‘a‘
J
Llwi'il't‘2ll fl '1'
I J A u

2.3 Facial ]

- . _ .‘ ‘
£.>‘M'"’[V ‘, C v I I‘
~~~ .uuulu. .61 ."1

\v

1

“'-W' my. I
\ a, , ,7
\M» A I1 1 x” A

soA

"é?" '4' oil] .
h" iv: [‘1 ~‘~¢n1 "

IA‘I'

. ‘. '
t « '
....»-. .rrrzi. 11.1
_ "a '
I” l.‘ 1
~~m...'3h Ir: 1'
A

ug‘ '
u' H? .
3‘ NC him- ,.-'
J“ “;A
. ls b ‘
h I.» ‘jc‘\’4r‘ ()7. l
-4
- AA

k,‘ t .
'p ”-51 8 . A
.1 .;r ,. '.
' ‘ L‘ “Ii V f
v.,
I.
_ “.v .
”'“d (1"? \
AL ‘ (1"

using thresholding. A problem with their system was that the lighting conditions
were harsh for the user.

Hutchinson et al. [40] and Morimoto et al. [54] used an infrared light source to
illuminate the eye. Using a simple thresholding, in images obtained from an IR

camera they can easily ﬁnd a pupil and light reﬂection off the cornea and retina.

2.3 Facial Expression Discrimination

Determining facial expressions could be done either from a still image, or from video
stream. A number of different algorithms have been proposed for both cases. In
the case of still images, algorithms like shape analysis, PCA-based classiﬁcation of
salient regions, and neural networks-based classiﬁcation are used. In the case of
video stream analysis, motion vectors are computed, and expressions are determined
based on them. In fact, in the later case, typically the change in facial expression is
determined (e.g., change from neutral to smile), while in the former case, the static
image is analyzed and classiﬁed into an expression category.

Yacoob and Davis [92] use optical ﬂow computation to identify the direction of
rigid and non-rigid motion of the face muscles when changing expressions. Their
mOdel is based on linguistic and psychological considerations. They distinguish six
expressions: disgust, sadness, happiness, fear, anger and surprise. The success rate
raI‘lged from 80% for sadness to 94% for surprise.

Kimura and Yachida [47] used the PCA analysis to classify face expression and its
degree. Potential nets were calculated from the motion vectors of the face and four

22

1
“W“ ‘ KN “Mr W
'l,.mt]ssl- ‘
p
:r: dﬁffr u! i
. , .
I 2} cm: ELM c

v
. 4-~-i-.. , . .
lJJ‘."Jm(‘Ai ‘K 1.3' \\ I‘.‘

.J i... . I
' .Y .. . ,vy v
.2}. “41. ul: .i..:..

2 1.. .41

. \
~- 4h...

r

. i '
.r‘v. ”'7'?!) ‘ '
. . ,
T ‘kﬂlLertnf-f‘ It.
.
un’:" ‘ ,
u ”3.12“” d '1 w -
I. “ . ‘0‘
* .

"our‘l l ,,
L]. a ‘ 4‘. ~.
44C“ ‘V‘G‘T-nl 1;, .

{1‘1“ "rd ‘ ',“
‘ ANA ‘

‘ ‘Al.

.lr’v-~.f ‘
‘\‘.V\’ .w 1
‘>._\. . ‘ ‘ [-.
t \a u. d. (2.”

.u..A JD “‘Q\ "(1"

Y‘.‘. ‘ V
‘ 4 I
“- f 1'» G. "r
‘u r l. ‘ ’
u .4 b“(- I‘
II .1 YV‘P' ' 7.
cl“. “GLHZ r.
:‘o.

5
v... . .
‘47: I’;~,, ‘
‘ “HLQ‘J PL .1
‘ ‘F‘bl
:5 ‘v
' J. W“‘[
‘ Ix ..
" in (If,
”g
”l. ‘v.
‘ AJI' ‘
. e. T t
.it‘ “‘1‘
’4
.5-

.”ﬁ'k, ,
., I .‘_ ‘
M. .
’Qgi‘Jr r

 

expressions were determined: neutral, anger, happiness and surprise. They showed
that the degree of an expression can be estimated as the person changes from neutral
to the other three expressions. An evaluation of their system was done on a single
individual who was used for training also. The performance of the system was not as
good when an unknown test person used it.

Oliver et al. [59] used mouth shape to discriminate four expressions: neutral, open
mouth, smile, and open mouth smile. Their primary feature is the mouth shape that
is characterized by the XY eigenvalues of the mouth region and the XY position of
four extrema points. They used these features in a Hidden Markov Model (Hh‘lM)
for ﬁnal classiﬁcation, and achieved 96% accuracy.

Bakié and Miller [7] used an artiﬁcial neural network to discriminate the same
expressions as Oliver. The overall accuracy achieved was 84%. The open mouth
expression was easier for discrimination than neutral, smile and open smile, and the
network had the most confusion between the latter two expressions.

Colmenarez et al. [21] proposed a classiﬁcation based on several features. Six
expressions were considered: neutral, sadness, happiness, surprise, disgust and anger.
They deﬁne regions for facial features and use the position of extreme points (e.g.,
eye or mouth corners) within the regions for classiﬁcation, as well as the images of
regions. The features they use are the end points of the eyebrows, eye and mouth
Corners and the tip of the nose. For each user they design a separate facial expression
Classiﬁcation based on the PCA, and using their face recognition algorithm, the user’s
model is retrieved and used. Using the feature points only, the error rates are 19—46%,

and using both the feature points and feature images, the error rates are 6—20%.

23

 

.’

“AW—’

‘41
W”... '
a. '1' .
‘4'. f 1".0 .
"w ‘ A ‘, v
[‘ylixv‘tl‘ir‘ v
?~

5 .
T.“ D. .
.L“ .- ‘rI
“Ad'c. (.
‘J
1"
«1,. _,
‘N N v
L ‘d I ' u
‘3‘}; .
1‘ J
[15" W

 

 

Image Plane

Coordinate systems and perspective transformation and three feature points on face.

Figure 2.1: The illustration of the system described in Ohmura et al. [58]

2.4 Gaze Direction Detection

In this Section, several methods for gaze estimation are presented: based on mathe-
matical equations, based on the detection of pupil reﬂections, based on limbus track-
ing, based on teaching an artiﬁcial neural network to map eye image to gaze direction,

and based on electro-oculography.

2.4.1 Mathematical Solution

In 1988 Ohmura et al. [58] presented an algorithm for calculation of gaze direction
from the image coordinates of three points, distances of the three points on the face,
and camera focal length. The gaze was deﬁned as perpendicular to the plane that
Passes through the three face points. They used three cosmetic marks near eyes

24

 

 

' .. I]. i
If“ 'J‘fy), ‘\ till -I

at -

IIv
to

l“... o 0‘
‘ h ‘5‘ [AI

1
.A]

.. i

i I'll iv ‘
i a

him.“ A uu'A

u ‘ u
. , 9'
s ,l' v v ‘ -
W" . p]
.u.‘ l1 ,A- A u
i
#5 "~

1 y ‘
.fq— t '
V » (

If” A \ nl

, .
l ' v Q a

lio" o
“Luna v]; .H

 

'r:. ‘L ' l I
'-.L F i" - ‘
.G., ...;
" D
V . '
‘1 V H ‘ ,. Y
‘ ”-4; ‘
.
b.
i" ' 11
V v , . ,
‘L' d[]u [ .‘V'v
‘ I‘ I ‘tl

 

‘11
'l‘
I"
\N‘. 5,.‘4 ,
u“ “in .1
J ,L‘d
.j‘
'-
I 4',
.; 1..
N 41" F. '
‘_4\3’
“4',;

and nose, which were tracked in real-time using specialized hardware. Figure 2.4.1
illustrates their system.
Ballard and Stockman ’[10] used the algorithm from [58] to calculate user’s gaze.

They applied their results to the HCI task of menu selection.

Gee and Cipolla [31] presented a solution based on the knowledge of relative
distances of facial features, and assumption that the imaging process is weak per-
spective [67]. Using simple calculations based on the geometry of the face and imag-
ing process, and knowing the distances between eyes, between eye line and mouth,
nose and mouth, and nose depth, they can estimate the face normal. Two methods
were proposed—one based on the 3D geometry, and another based on the skew—
symmetry [55] of the facial plane. They calculated the accuracy of their model using
a generated face model, which was rotated in all possible directions. They showed

that the calculated normal is within 6" of the true facial normal.

Stiefelhagen et al. [75] posed the gaze determination problem as the pose esti-
mation problem, and used iterative POSIT algorithm developed by DeMenthon and
Davis [24]. In addition to head-pose estimation, they estimate eye-gaze using an

Artiﬁcial Neural Network in the same manner as described in [11].

2.4.2 Pupil Reﬂections

Hlltchinson et al. [40] designed an eye-controlled HCI for disabled persons. They
illuminated the user’s eye with infrared light, and the gaze was inferred from the
rElative position of the bright-eye (the reﬂection off the retina) and the glint (the

25

 

—’
T‘
lrﬁD' '1‘. T
NU ---i

 

;
K“

 

 

r.
i“...
i.
’.
x‘ :.V
.‘qu ,
’ ‘f'l-‘u II
. __.‘

Cornea

 

—————————-————————-—_—___—_.—_—._—

 

Purkinje images: lst 2nd 3rd 4th

Figure 2.2: The four Purkinje images are formed as reﬂections of incoming light rays
on the boundaries of the lens and cornea

reﬂection from the cornea, the ﬁrst Purkinje image). Figure 2.2 illustrates the four
Purkinje images. The system needs to be calibrated before each session. For the
calibration to be valid, the user’s head should be still throughout the session. In [48]

they extended the system to allow some head movements.

Talmi and Liu [79] obtained a. zoomed eye image by focusing a camera to the cur-
rent eye position in 3D space. They found the pupil center and the bright reﬂection
from a light source, and used their relative position to determine user’s gaze. The sys-
tem needs to be calibrated before each session, and is sensitive to head motion. Using
Stereo eye tracking, they were able to compensate for some head motion. However,
the time lag needed to re-calibrate and re-focus the eye camera upon head motion
Was one second, which may be problematic in a real-time application.

26

 

y-
.4
~,

 

’p.
p».
1‘ ”\f ‘
. u
‘l .
h. J
I
u
n 3
mg ‘n_
‘u _r
‘A
\

 

2.4.3 Limbus Tracking

The limbus tracking method tracks the boundary between the dark iris and white
sclera (the white of the eye). By measuring the proportion of sclera left and right of
the iris, we could determine horizontal eye motion and transform that into gaze. This
method, however, is not suitable for vertical motion detection. Similar to the systems
described in Section 2.4.2, this system would require calibration before each session.
Applied Science Laboratories [1] has developed a commercial eye-tracker based on

this method, that requires the user to wear glasses with a tracking equipment.

2.4.4 Artiﬁcial Neural Network Estimation

Baluja and Pomerleau [11] trained an Artiﬁcial Neural Network to estimate gaze
direction. The input to the ANN was an image of an eye, and the output was a grid
of 50x50 units organized as X and Y coordinates of the gaze. The network was trained
to have Gaussian output representation, similarly to what was used in the ALVIN N
System [63]. For each user a separate ANN has been trained. Achieved accuracy was
1.50. However, it is not clear whether signiﬁcant head motion was allowed, and what

the testing procedure was.

2.4.5 Electro-Oculography

Kaufman et al. [44] presented a gaze tracking system based on electro—oculography.
They used neurophysiological methods for eye movement measurement, which place
Electrodes close to the eyes, and measure the potentials. These signals could then be

27

 

r O y\
1 ' ["1 t
on. fln.‘4 .2.
.
"' Iv- n
In

...‘..;3. ii.
I

“’1‘ 34.3.3.

 

. V- ‘
'r
' ‘4'
l d
- ' . .
‘9‘.a..\..‘

 

-r“ .‘u
-~.' a I I
<4
. ‘-.

{V

\
]‘h
‘Iqu-‘
~. ...w.
‘ \
a. r “I"

 

 

‘1
L,
[RV
‘.}"t
I“.
“A. l‘h‘ ‘
'."Al;i

used as an indication of user’s gaze. Their system needs to be calibrated before each

session, and head movement is not tolerated by the system.

2.5 Kalman Filter

The problem of parameter estimation can easily be solved if we know all the past data
points. In many practical applications, saving all the data points is not feasible. In
1960, Kalman [43] pr0posed a recursive solution to the discrete—data linear ﬁltering
problem. The Kalman ﬁlter is a set of mathematical equations that provide an efﬁcient
computational (recursive) solution of the least-squares method. An introduction to
the Kalman ﬁlter is available in [89]. In this Section, just a brief description of the
Kalman ﬁlter equations is presented.

The goal of the process is to estimate the state :1: 6 ER" of a discrete-time controlled
process. The variable a: to be estimated is described by the linear stochastic difference
equation:

xk+1 = Akxk + Bku,c + wk.

The measurement 2 E W" is used to estimate the value of x:

2,, = Hkx]c + vk.

The matrix A relates the state x at time step It to the state at step k + 1, Matrix
B relates the control input 11 E 32’ to the state x, matrix H relates the state x to the
measurement 2. Random variables w and v represent the process and measurement
HOise, respectively. They are assumed to be independent of each other, white and
With normal probability distribution.

28

 

11“: ’7‘

'i'lV‘
"4 'I

..iu.. ...

'. ' n
- Lord, ,1

.4.—
C

The estimation algorithm has two steps:

0 Measurement Update, or “Correct” step, updates the values of Kalman gain

matrix K, estimate x with measurement 2, and error covariance matrix P.

0 Time Update, or “Predict” step, projects ahead the state x and error covariance

matrix P.

The steps are combined as follows:

1. Initial estimation of x and P;

2. Take measurement 2 from the world, and perform Measurement Update step;

3. Perform Time Update step and use the state estimate x as needed;

4. Go to step 2.

The equations for the update steps are as follows:

 

Time update:

X,:_H = Aka + Bkuk

 

2+1 = AkPkAf + Qk

 

 

 

Measurement update: 2k : i; + K,,(z,c — Hkxg)

Kk = P;H{(HkP;Hf + erl

Pk : (I — Kka)P;

 

 

With the following deﬁnitions:

XE: xk

Pita Pk

Kk

R]:

Q]:

2;; - Hki;

a priori, a posteriori state estimate

a priori, a posteriori error covariance estimate
Kalman gain or blending factor,

minimize the a posteriori error covariance
measurement error covariance matrix

process noise

measurement innovation or the residual

29

 

v';

|| CRT ‘ LCD | ELD

 

 

 

resolution 1280 x 1024 up to 70 addressable
or 1024 x 800 800 x 1000 dots per inch
size 15 to 21 inches up to 14 inches 6 by 8 inches up to
diameter diameter 12 by 16 inches

 

 

 

 

Table 2.1: Resolution and size characteristics of common display devices

2.6 HCI-Related Issues

This Section discusses some Human-Computer Interaction (HCI) issues that are used
to evaluate input devices. First, basic input device terminology and mental models
in the HCI are presented. Fitts’ law, used to express the time to move the pointer
from one location to another, is presented. A study of eye tracker as an input de-
vice is presented, and, ﬁnally, several existing eye- and head-controlled interfaces are

presented.

2.6.1 Basic 2D Input Devices

In this section, basics of the input device terminology and interaction tasks are pre-
sented. For details, see [35, 28].

The most common display technologies used nowadays are Cathode-Ray Tube
(CRT) displays, Liquid-Crystal Displays (LCD) and electroluminescent displays
(ELD). Table 2.1 shows basic resolution and size characteristics of the above three
input devices.

Conventional input devices that are used in conjunction with the above display

deVices are touch screen, light pens, graphic tablets, mouse, trackballs, and joysticks.

30

 

 

Or“ 9‘!
it to. -
V
n y ‘I\

l"

- u‘i .

{Y ,Vl'i W‘

1““: -‘

.4...

“If“ 7,.

«II'J

 

‘At

. I71.
I

1"»
Q)!" v
A .

r:

\

U" ‘»

". i.
‘<

Design details are described very briefly for each of them, and the emphasis is on the

resolution and usability of each device.

Touch Screen produces an input signal as a response to a touch or movement. of
the ﬁnger on the display. Touch screens can be based on several technologies:
conductive, capacitive, cross-wire, acoustic or infrared. The resolution can range
from 25 x 40 for infrared, through 256 x 256 for capacitive, to 1000 x 1000 to
4000 x 4000 for conductive discrete touch points. Studies showed that cross-
wire technology had the best tradeoff between display resolution, touch screen
resolution and user preference. The issue with touch screens is that the user
needs to press the target area, and a user’s ﬁngers are not too good for high-
resolution target selection. Thus, typical selection area sizes are in centimeters
(e.g., 1 x 2cm with spacing of 1cm). The advantage of touch screen device
is that the input device is output device at the same time, thus the hand-eye
coordination is natural. The disadvantage of the device is that the resolution is
limited by the ﬁnger size, regardless of possible device resolution, thus making
it unsuitable for small object picking. This can be somewhat resolved by using a
stylus, however, the hand movements are not natural in that case, and the hand
may obscure the diSplay. Another disadvantage is the parallax problem: the
display and the touch-sensitive screen are separated, thus the user places the
ﬁnger slightly above the target on the display. This can be resolved by asking
user to place ﬁngers perpendicular to the screen, however, then the naturalness
of the device is decreased.

31

 

 

llglll P911 1:
par M
iii ﬁlm I
p.115: 1'

'h“

0
o
\L‘ Lu'l

Light Pen is a stylus that generates position information when pointed to the dis—
play screen. The pen senses the electron beam of the CRT and based on timing
of the refresh signal, the position of the pen can be calculated. The highest
possible resolution obtained is 1 / 4 of a pixel on a 1000 x 1000 display. Similar
to the touch screens, the usability of the light pen is more dependent on the

user characteristics and pointing capabilities than on the technology itself.

Graphic Tablets have a ﬂat panel placed on a table near the display. N’Iovement
of a ﬁnger or a stylus provides position information. The tablets are based on
a matrix-encode, voltage-gradient, acoustic, electro-acoustic or touch-sensitive
technology. The cursor control can be absolute (cursor position on the tablet
is returned), or relative (the cursor movement and previous cursor position
are used to calculate the new cursor position). Another issue is what the dis-
play/ control gain level is. One study showed that for a 12.5in display, a gain of
0.8 to 1.0 worked the best. Tablets have a resolution problem similar to that of
the touch screens, and while the display is not obscured by the hand movement

and there is no parallax problem, hand-eye coordination is needed.

Mouse is a small hand-held box that measures its movement on the pad surface,
and the movement is translated into the graphical cursor movement. Mice
have one to three buttons that are used for various selection tasks. Mouse
movement can be detected mechanically or optically. In the former case, no
special pad is required, while in the later case, a special pad must be used and
mouse movements are restricted to it. Similar to graphic tablets, display/ control

32

 

, .,
"r .. .
Blvd ‘1.
h: ,1
LJA 1'

VI

, r.” .
‘i l," 1
L. Li

UL 11.;

‘§A , I i
'7 ' ~ .

l .
.L‘ . . L

L'L“
.gh
T». .
‘4‘” Ar,
\A._
"4 \ ",.,
M.
«1.~
;
7.5-Y _
.‘ ,
a. (1“l‘
_i
5%”.
i', v- ‘
.
“ '1
.
1“"
.9-

gain can be adjusted to translate mouse movements into display movements.
The gain can be larger for rapid movements than for slower movements, thus
enabling fast and accurate positioning. The mouse can provide only relative
mode and hand-eye coordination is required. Up to a display pixel accuracy can
be achieved with the mouse, and it can be easily used to point to small targets.
However, some difﬁculty might arise in selecting them since the selection button

is on the mouse and pressing it might cause mouse movement.

Trackball is a ball in a ﬁxed housing that can be rotated in any direction. The
rotation is translated into cursor movement. The underlying technology is based
on optical or shaft encoders. Similar to the tablets and mice, display/control

gain must be speciﬁed and the trackball works in relative mode only.

Joystick is a lever mounted vertically in a ﬁxed base. Potentiometers sense the
movement of the lever and the output signal can be transformed into cursor
movements. A force or isometric joystic has a rigid lever, and the amount of force
applied results in cursor movements. Similar to the mice, the display/control

gain factor needs to be appropriately adjusted.

The above devices are used for various tasks based on their speciﬁcations. Touch
screens are best suited for menu selection tasks for data already displayed on the
screen and are widely used in information kiosks. Light pens are used for menu
selection and locating and moving symbols on the display. The graphic tablets are
best used for drawing purposes and can be used to select menu items. The mice and
trackballs are best suited for pointing and selection tasks. They could be used for

33

 

. I
,.... la _v '
17:11.31. lw-l’

1'v'tvnv n 'Y

Hauuj I» 5‘

3 It" '1

I . :
.v I '. '
‘H‘ tr. (1"!

. All:

v .
A .
U.r ‘ kl.
.7“, 7r

“ OBI
Il

'4. "v-
L!-

K

.
d",1‘r
i.,L

 

gr
‘1
if.

bv—u] D‘

'V‘

v

drawing, however, the tablets are more useful in such cases. The joystics are used for
continuous tracking task and pointing that does not require high precision.
The above devices are all used as locator devices, and they can be absolute or

relative, direct or indirect, and discrete or continuous.

0 Absolute devices have a frame of reference, and the origin and all cursor positions
are calculated with respect to the reference frame. Relative devices report only
the movement of the device, and the display position changes with respect to
the previous position. Thus, with the relative devices, the user can specify an
arbitrarily large change in position, while with the absolute devices, the range
of the position change is restricted. Another advantage of relative devices is
that the cursor could be positioned anywhere on the screen from an application

program.

The touch screen and light pen are absolute devices. Graphic tablets can be
conﬁgured as both absolute and relative device. Mouse, trackball and joystick

are all relative devices.

0 With direct devices, the user points directly with the ﬁnger or stylus, while with
indirect devices the user moves a device that is not on the screen, which results
in screen cursor motion. For indirect devices, hand-eye coordination is needed,

and that typically requires some time to learn.

The touch screen and light pen are direct devices. The graphic tablets, mouse,
trackball and joystick are all indirect devices.

34

 

.-o- -
HUI l-
4. ,.;.

cur». if

: l
' i 0‘
l~ 1.? ..

g o-

. ..o.~

Till-i lii

u ,
" A
“an.“

-' v.
'5‘.»&I I. -' c

(if
.
"w 5’" ii
a 1
TL».
TI’K‘ Brim
“d
l".
4‘ 5N. .
j Cl! 1‘"
l I
.
, .'
{H'Dx‘rq

0 Continuous devices generate smooth cursor motion as a result of smooth hand
motion, while discrete devices like cursor-control keys do not provide smooth
cursor movement. In case of continuous devices, the speed of cursor positioning
is deﬁned by the control-to-display ratio, or C/ D ratio. The ratio deﬁnes the
scaling of hand movement change to screen cursor movement change. It can be
conﬁgured such that for rapid motion the ratio is large while for slow motion

the ratio is small, thus allowing accurate positioning.

Examples of continuous devices are the graphic tablets, joystick and mouse.

2.6.2 Interaction Tasks

The two basic interaction tasks are positioning and selection tasks. A positioning task
is specifying an (r, y) position to the application program. The interaction technique
involved is moving a display cursor to the desired location and pushing a button. The
selection task is choosing an element from a choice set (commands, attribute values,
object classes or object instances). This task can be performed by pointing at a visual

representation of a set element or pressing a function key for a set element.

Often used interaction tasks for comparison of input devices are target acquisition,
menu and text selection, text entering and editing, and continuous tracking. The
target acquisition involves positioning of a cursor at or inside the target position and
pressing a button to indicate selection. Experiments often have varying target area
(e.g., 0.13 to 2.14 cm2), and target distance from the initial cursor position (e.g., 2,
4, 6, 8, and 16 cm). The menu selection task involves selection of a target menu

35

16123.“. a t:

‘ ‘ 'h»'1>4
'r;lf\ IL‘A.AJ|
.3.» t

l V .

SCI ;

'v.‘

. .. H “'
“"JL‘? l")
”A .
“ ' \ v ,
. d. C“.
.
1». ‘
4: H
1" U i.‘
l.‘
\W" i" ‘
"‘-.‘_‘
.1. ‘
I’v‘r
. .
k. '1-_ t r‘
.k.
V
l' ‘1-
kl ‘ 64
A‘-
'1‘. ‘
{Tam :
-.

item in a variable length menu. Varying items are the number of menu choices and
their spatial and logical organization. The text entering and editing task is typically
done via keyboard, but other forms of input have been explored (e.g., using light
pen, mouse, etc.) The continuous tracking task consists of the user following a cursor

position with an input device.

2.6.3 Mental Models in Human-Computer Interaction

What does a user know about a certain HCI system? How can a developer model
user’s knowledge? These are very important questions from the standpoint of a
developer, and it is very hard to give exact answers. If a software developer can
model a user’s reasoning about the system, the system’s usability and efﬁciency will
be better. From the user’s point of view, learning to use the system would be easier.
In this Section, Carroll and Reitman Olson’s [18] Chapter in Hellander’s Handbook of

HCI is reviewed, and basic terms and models from it are introduced.

What the user knows about a software system includes (i) rules that prescribe
actions to be applied (simple sequences), (ii) general methods that ﬁt general situ-
ations and (iii) a “mental model”—user’s perception of how the system works and
how to use it (this should include the knowledge about system components, their
interconnections, and ability to construct reasonable actions and explanations why

the actions are appropriate).

In the simple sequences representation, the user rote memorizes the sequences

needed for certain tasks. For example, the user would memorize a command to be

36

o i, ‘
.351 [1r \
Vir‘ .

 

L... "l 1
. '_
' "m‘ Li .‘

W \ ‘
v ., ,‘7‘ ..
lll|H 7
- .h ,

a, .1 -

_.
4' "‘"-Vi

'"LA. .
n..
.\

.‘\
I

r y“

i.‘ M.
'«.,

.il'
Mu“

 

 

“
\“ ‘
‘Hil ‘b
‘ r
"A
.,T v
.I,'\[‘
A ~‘-‘
H,
f“v.
A ‘k‘:
l
L' 3
u \ ,
<‘ ”I
J
a
‘l‘ l,
. .
.‘,
.h
\v.l‘ F.
~ '1, C
V
c
p
5
kn ,.

typed, and might have no knowledge about the underlying system or general rules
that can be applied.

At the second level of representation, we can model the knowledge of methods
that can be used. This is often done by modeling tasks and goals to be achieved.
Card, Moran and Newell [16] proposed the GOMS model, which stands for Goals,
Operators, Methods, and Selection rules. In this model, the user recognizes that a
primary goal can be broken into set of subgoals, which can be further broken into
subgoals... until a subgoal matches a method in the system. The user has some rules
by which subgoals are created and methods chosen. A number of empirical studies
showed that this model can be applied to a variety of tasks, sometimes without even
changing time parameters.

Another model of user’s knowledge of methods is based on command grammars:
Command Language Grammar (CLG) [53], Backus Naur Form (BNF) [65, 66], and
Task Action Grammars [61]. These grammars are sets of rules that can be used to
perform actions in a system, and “sentences” that are acceptable in the grammar
represent correct actions that the user can apply. The grammar rules have similar
structure as GOMS subgoals, but show more compactly alternative ways to accom-
plish a task.

Finally, we can model the most complex level—“mental model”. There are several
kinds of models: surrogates, metaphors, glass box machines and network models.
Surrogates [98] is a conceptual analysis that mimics perfectly the target systems’s
input and output. It is not assumed that the surrogate and the real system produce
the output in the same way, thus real causal basis of input-output cannot be provided.

37

 

4 (l,
.l u.
’7 "1: o
L. 41,,
p
c.
F1r‘ "‘
l‘“ ‘- sa-.
9
a. F
m . ‘
a. J. _
l
:n '
Y" ‘ "I?
“M" .i.
n. V .9
. 4 ' ‘
.u, .. H
.
.v[
[7‘
. ‘
n,
\v- ”7"?!
.,‘
i, A . v
. ' ‘ ‘
.. U, ‘ u
’s
T a
lv,V’ ‘ f
'P
u ‘4”;—
ﬁ i
”main 1
i ' M. ‘l ,
7-. ,‘ ,
\ 4" <V i
....~. 4“. .
.
.
~V ‘
Yr ‘ .’
“ >4. ,
‘W l
J k§
"r
”-1 ‘7» . u.
““¢ 1"
~\ h
u, '
‘Ak~ 4‘
n A‘.

‘1
N K r
. ‘ .
“V" l.-
..‘\
‘-.
\> S
a 1- x
5.,“ ,
. 1‘ 1
\'
'.
‘~." r.
a“ .

i.

A metaphor model [19] compares directly the target system with a system already
known to the user (e.g., text editor and typewriter). The user easily learns known
functions, however, new functions are harder to comprehend and are a constant source
of errors. Glass box models [25] are a mixture of surrogates and metaphors. They
mimic the target system, and provide some semantic explanations for the internal
components. They are more used in a prescriptive context rather than in a descriptive

0118.

Network representations of the system have states the system can be in, and
the actions that the user can take to change states [51]. Kieras and Polson [45]
“Generalized Transition Network” (GTN) contains detailed description of what the
system does. The states of the network represent visible states of the system (e.g.,
screen display), and the arcs represent commands and menu choices that can be taken
from each state. In simple terms, GTN represents what can be expected of the system
when the user takes some action, and that knowledge can be useful to the user while

learning and recovering from errors.

2.6.4 Fitts’ Law

How fast can a human position a cursor from point A to point B on a computer
display? The time needed can be estimated by using Fitts’ law equation [16].

The model is based on a Model Human Processor. It consists of three processors:
(2') perceptual system—sensors and associated buffer memories such as visual and
auditory image storage; (ii) cognitive system—based on sensory information from

38

 

 

 

 

.w - ‘1
.. r i

; l-
...“ -.-)
v—r.’

 

l ..
‘u.
,
.

s“. w
h "« Lui‘
v."

. V '
.Que p,

, 1'
‘v
C“. ‘
I“ v A
, u,
l
l.
5‘.‘
Av,
6',‘
g;
‘l.
‘. 7
-. ‘
I v
5'.
‘ " .‘gbc
y'-
NA
4 _
M
n;
t. e.
..
‘ \

 

 

 

 

 

< ------- >
A = x0 X1 X7- B
l l . TARGET
l l
3mm
D
-< —————————————————————————————————— ->

Figure 2.3: Fitts’ law: analysis of the movement of user’s hand (from [16], page 52)

working memory and previously stored information from long term memory, a. decision

is made on how to respond; ( iii ) motor system—carries out the response.

The task of moving a pointer from point A to point B is illustrated in Figure 2.3.
The points are D units apart, and the goal is to move the pointer within S/ 2 units
from the target. In each cycle, the user’s perceptual processor observes the hand
(time needed rp), cognitive processor decides on corrections (time TC), and motor
processor performs the correction (time TM). The number of such cycles is 77., thus
the total time needed is

T,,, = n(7'p + TC + TM)
T100, 2 1M log2(%)

— I J . . I
where I” — —L————ZTP+TC+T" [1n msec/bit] and 6 IS a constant.
1 loge ’

For low values of log2(%), the equation does not ﬁt the data well, and Welford [90]

proposed a correction that better ﬁts the data:

Tpos 2 1M10g2(% + %)
The value of 1M is calculated empirically, and ranges [M = 27 ~ 122 msec/bit,
and is usually about I M = 100 msec / bit.

39

 

Fitting data into Fitts’ law equation provides an easy comparison between dif-
ferent input devices’ performance on the same task. For example, the data for the
conventional mouse and for an eye-mouse could be compared, as will be discussed in

the following Section.

2.6.5 Evaluation of Eye-Tracker as Input Device

Ware and Mikaelian [86] studied eye tracking data and eye-movement usability as a
computer input device. They used eye-tracking equipment based on infrared corneal

reflection, and the system required the user’s head to be still during the experiment.

In the ﬁrst experiment, the user’s task was to select one of seven vertically arranged
buttons on the screen. To indicate the location that the eye-tracker “thinks” the user
is looking at, they showed the current cursor location, as well as outlined the button
that was ﬁxated. Three selection mechanisms were tested: 1) dwell time button—the
object that is ﬁxated for more than some interval gets selected; 2) screen button—a
large rectangular area on the screen is set aside for a selection button, and when the
user ﬁxates the selection button, the object that was last looked at is selected; 3)
hardware button—the user presses a physical button while ﬁxating the item to be
selected. Each trial proceeded as follows: the user would ﬁxate the central screen
point, and initiate a trial, then the target button was indicated and the user would
try to select it. The selection times were all below 1 second, and the dwell button was
the fastest selection mechanism, followed by the hardware and screen buttons. The

40

errors in the selection were 12%, 22% and 8.5% for the dwell, screen and hardware

button, respectively.

In the second experiment, the buttons were arranged into a 4 x 4 square matrix,
and the user was driven to selection of targets by the computer. The sizes of screen
buttons varied from 48mm to 7.2mm (at. the viewing distance of 90cm, the sizes
were 30—0450), and the dwell time was set to 0.4 seconds. For smaller button sizes
(045" and 0.750), selection times were signiﬁcantly higher, while for 1.50—3" sizes,
the selection time was at or below 1 second. The hardware button was faster for the
selection. In terms of errors made, for smaller buttons, the errors were 20—50%, and

for larger buttons, the errors were below 10%.

2.6.6 Eye-Controlled Systems

Hutchinson et al. [40] designed an eye-controlled HCI for disabled persons—the
quadriplegic population who retain some motor control of their eyes. The system
is named Erica for eye-gaze-response interface computer aid. As mentioned in Sec-
tions 2.2 and 2.4.2, they use an infrared light source to locate glint and bright eye,
and based on their relative position and calibration they can determine the user’s
gaze. The user’s head must remain stationary for the system to work properly. In
their system, there is a 3 x 3 matrix with 9 menu buttons. To select an option, the
user needs to stare at its menu box, and after some ﬁxed time interval (2 or 3 seconds,
but it can be altered) a tone sounds and the cursor appears in the menu box. If the
user continues to stare at the same box, the tone will sound again and that option will

41

.I","~ iIY

.1 an“ .AA-

,
Q
lg.“ ‘
.J‘. a
. ....

.
{1r 7 ,v '
.,l (.1

 

O
[m \
‘ A

up]

--. Le“

.

0" ‘

ti. ‘HY‘
T'L‘.
D», ,.
.1“

K

 

[LC

 

l.‘
as" "
u.
.

 

be selected. A number of applications are available for Erica: control of appliances,
communications programs such as word-processor and voice synthesizer, computer
games, and text reading. In the case of the word processing program, only the upper
two rows are used for control and the lower row is used for text display. To enter one
page of text, it can take 85 minutes for an experienced user. In the case of the text
reading program, the upper two rows are used for display and the lower row is used
for controls.

Frey et al. [30] improved Erica’s original word processor. They used a menu-
tree to organize letters based on their frequency, and re-organized the menu options
(letters) based on the probability of the next letter in a word. The average time to
pick a character is 1 second, which leads to 80 minutes per page. White et al. [48]
further improved the system by enabling spatially dynamic calibration, thus allowing
for some head movement during the session.

Buquet et al. [14] installed a slide viewer in a museum in Paris, that was based
on eye-tracking and gaze as input. Their results were quite promising (on 153 test
persons, the success rate was 83%). However, as noted in [32], the participants were
not described, which leads to a conclusion that they were a highly motivated chosen
target group.

In a museum in Denmark, an eye-controlled exhibition system has been installed.
Glenstrup and Eugell-Nielsen [32] attended an exhibition and tested the system exten-
sively. The system, called EyeCatcher, introduces a new on-screen button—EyeCon.
The button changes from open eye to closed eye to indicate to the user that it has
been selected. The selection is made by ﬁxating the button for a time period. The

42

 

 

\
«339:: new

v v ' '
vr v yyv[,y’ .- " '
“Add“; ii iii.

4 ,..

1' '. v ‘
“a “tip 11“"

T
.t..d1..::_ 1“.

9
"v ..-, ,
r- ,‘»‘.
“Ad“: “‘43.
1
A, ' a.
i', (pr. [V ? ‘
“A: “A“
. .
.A ~ ,
, .
.lJ‘ l‘ ; y , ,
. a; l.."

 

q- V
Q ‘r
I ‘Q
‘1‘, - ,
Nl'
7?.
J‘ LR.

system needed a calibration phase, and offered text reading, looking at pictures and
watching a ﬁlm clip. The overall conclusion was that the interested users (e. g., adults)
were able to use it, despite the boring calibration phase. Children were very impatient
and were not able to use the system. Among other problems, they noted the need to
keep the head still.

Starker and Bolt [74] tried to design a non-command [57] eye-tracking based in-
terface. The application is based on “The Little Prince” novel of Antoine de Saint
Exupery. The user’s interest in objects is measured by the length of ﬁxation and the
interest levels “age” if they are not refreshed by ﬁxations. The user will automatically
obtain information about an object if the system “thinks” that user’s interest level
for it is high.

Jacob [41] conducted extensive research on eye-tracking based HCI. In his exper-
iments, the Applied Science Laboratory [1] eye-tracking equipment based on infrared
reflections was used. In such systems, the user’s head must remain stationary. Several
improvements of the use of raw eye-tracking data were proposed. First, he discov-
ered that calibration was not uniform across the screen, e.g., more imprecise at some
locations. To correct this, the user could move the mouse pointer to the problematic
area and stare it for a while while the system re-calibrates. Second, if the user ﬁxates
slightly off-target, the system accepts that as target ﬁxation. To ensure that there is
no mistake in the selection, the system will accept off-target ﬁxation only if the ﬁxated
point is far away from other targets. Finally, the problem with raw eye-tracking data
is that some noise could be present. The noise can come either from unconscious,
jittery eye-movement, or blinking, or some system measurement error. To avoid the

43

. ,_ _
Usilét. data limi‘lll.v’.;‘
as: interval. llli‘ I:

4K), 1 7". a f 1\
din, Ib.‘ Aiiﬁlt.)t1.1 ”x” ‘l L

 

 

 

' l
liaufrzan it a;

mini. llm IN-
1.,4 ;' .~ -. -. ' Y
ewlldf) M “l .. 1'.
l _ l '
LC Title 51* ‘1‘."5. l

‘ I‘ . .
. 5RD \l?‘

't
....i‘.mr ‘.. ‘

r!,‘ W!
Y ‘1

RH'PIlill'. sewn"

hr” I 4,“! l .
“ﬂ: ‘-lt~rluibi‘ll

ill I

7.15:. A ’
“E-‘f’ are

”rail-i). (‘i .1

Ml. .
e. .‘ .- \ "‘t '
Y‘all»
.. «LQEL P n,
.H I a:

\\" - '

P
Y .l~
t‘ r',: .’\
N .- .
\ "fair
‘IJII. \.

noise, data tokenization is used: instead of reporting all the tracking data, after 100
msec interval, the mean gaze position is reported. The resulting data is a string of
tokens instead of stream of eye-tracker measurements.

Kaufman et al. [44] designed an HCI based on eye-tracking using electro-
oculography (described in Section 2.4.5). The selection mechanism is blinking or
winking. They tested menu selection using 3 x 2 menu buttons, and the achieved

accuracy was 73%. In the case of selecting corners only, the accuracy was 90%.

LC Techologies, Inc. [4] offers commercial systems that use technology and offer

programs similar to those of Hutchinson et al. [40].

Recently, several systems based on Artiﬁcial Neural Network gaze detection have
been developed in research laboratories. The applications of such systems are not.
commercial yet. Baluja and Pomerleau [11] reported that they could achieve 1.5
degree accuracy, compared to 0.75 degrees in the commercial trackers. However, they
neither specify the methodology for evaluations, nor the number of subjects. Similarly,
Stiefelhagen et al. [75] reported errors in rotation angle around :13, g, and z axis of
1—5 degrees. They use their system for panoramic image viewing. Additionally, they
use their system [95] to locate head and perform lip reading, and to locate speakers

so that the signal coming from a microphone can be enhanced.

2.6.7 Problems with Eye-Tracking Systems

In this Section, several problems related to the state-of-the—art eye-tracking and gaze
detection systems are discussed.

44

Non-IntrusiVen‘”

«n.- w
the" 11‘”
3 calf-{Hf

[th (IV? lli

 

.‘..

and. uim'
._ l
resin?) ill

till" 7h" *"

T ,: -
ll.» 0l».\r'l"5'(:l ‘

(r: 3“};
, l r‘ -'
.a. m. Ev:

Non-Intrusiveness: Jacob wrote in [41]:

“The eye tracker is, strictly speaking, non-intrusive and does not. touch
the user in any way. Our setting is almost identical to that for a user of
a conventional ofﬁce computer. Nevertheless, we ﬁnd it difﬁcult to ignore
the eye tracker. It is noisy; the dimmed room lighting is unusual; the dull
red light, while not annoying, is a constant reminder of the equipment;
and, most signiﬁcantly, the action of the servo-controlled mirror, which
results in the red light following the slightest motion of user’s head gives

one the eerie feeling of being watched.”

This observation illustrates the non-intrusiveness problem related to most eye—
trackers. Even though the equipment does not touch the user, the user is often
required to be completely still. Additionally, some question the health of the

use of infrared lighting.

Calibration: In most eye—tracking systems, the calibration step is a must. The user
is required to calibrate the system before each session, and this might result
in an annoying situation if the system is to be widely used. Another problem
with calibration is that if the user moves out of focus of the camera that is
tracking, the system would need to be re—calibrated. Thus, systems that would
not require calibration before use, and that would enable free head motion,
would be truly non-intrusive and user-friendly ones.

45

llidas Touch: .-
tlizrrn. tlu- l

the 1m In

Wil"l*“§r[ \
‘> u
lliilim Tl,’

UNIT f [Lt I‘

I‘ll r

14..
' J
‘ ~'i1.44.(i“f

15:8 He;

Midas Touch: The problem of the selection mechanism in eye-trackers is persistent.
Often, the target is selected after long ﬁxation. However, how do we know that
the user really wants the target to be selected? The user might just stare at a
point without the intention to select an item. Since we cannot “turn off” our
eyes when we don’t want to control the system, we could enter a state in which
wherever we look we activate some command, and the interface would have the
Midas Touch problem. An eye-tracking system should be able to distinguish
user’s intent to select an item, from the state of observance. In our aim to create
a better HCI, we are faced with a need to enable the user to seamlessly activate a

command without the need to perform some complicated and unnatural action.

2.6.8 Head-Controlled Systems

Heuvelmans et al. [39] designed a handless interface for disabled persons based on
head movements. The system has a head-borne mouse replacement unit based on
transducers. Ultrasonic sound waves are emitted by a control unit, and are received by
a head-mounted transducer that converts the sound information into electronic signals
differing in phase, which are further used to calculate the position of the pointer on
the screen. The selection signal is a “dwell button”, that is, the user ﬁxates a point
for a predeﬁned period of time. The system was used by two disabled persons, who
used a typewriting application and achieved a typing speed of 60 characters / minute.
In their setup, the smallest button was 34 x 20 pixels or 11.1 x 6.5 mm and all the
alphanumeric keys and space bar are displayed in the lower portion of the screen.

46

"i . II I
liltiaiizsix. it:
rich oi the w:

I y “.-
861.5le I.

..
~ ,¢-. ,

{mal'l‘v‘l “.'
in“. sun... \I‘II

pew ‘, ’l

w lini‘igi (1 ,l‘
.. ' I
. ‘IJJ .
IrLr‘bui ,d. Vi) [I I
. I- A
”"’V'pv ~‘ i- I,
“A' h [l“blv[.\“”_

””1 {6“ d; [r W "

I v ' .
Brabli‘. i3

:‘11', .~, .'
N" gradidl’i T

“FTP“ r -:
ALP)“ UL ii

J
1“

£1,554 ,- _ ,_
H. _’n all h:

7;,«1. .. .
~. .jf[prn«,‘. .l
.AA“‘I‘}~J[J

"ﬁx
I L'
‘.

itazior. Iii

”AU ' {III-L r1

Additionally, the screen has several control buttons arranged in a top row and in the
middle of the screen is a text display area.

Beardsley [12] designed a system for monitoring of drivers. The system is based
on matching against synthetic images from a 3D model to determine an approximate
head pose. In the initialization step, the 3D model of a head is projected onto
the image and manually adjusted. The model is rotated and template images are
generated, which are used for later comparison. To better discriminate whether the
driver is looking up or down, the state of the eyelid is determined using a skin color
model segmentation of eye area. Based on the knowledge of the interior of the car, they
can say approximately where the driver is looking (e.g., rear mirror or dashboard).

Bradski [13] one used head motion as an input for applications like computer games
and graphics. The system is based on the mean shift algorithm, and they proposed an
extension of the algorithm called CAMSHIFT (Continuously Adaptive Mean Shift).
Based on an HSV color histogram, the probability distribution of the head location
is determined. Based on zeroth, ﬁrst and second moments they determine head
orientation. The controls for the computer game are based on left/ right, back/ forth

and up / down motion.

2.7 Visual Perception Studies

In studies on human visual perception, eye movement during scene exploration is very
important. Cognitive scientists present various scenes to subjects, and observe the
pattern of exploration. The subject might be asked to explore the scene for some clues

47

' ,. iv
7 A H‘ {‘klbl'Ittti'
[II In,“

1 V
l ' q ., ”'l’l'lll‘j
" '.'I1 I . ‘ .(
[14:64; UR ”kit

as Initial. Ila

 

 

‘1'.“ Nil.

2.7.1 Types .

W" r l.‘ l,. ‘
"‘35" 1.05?le MAI

3705‘ if em IL: ..

" J“ [ii [if WI

27‘s I Why“ ‘ '-
T‘Ifl" II [Ul‘PJXrT all”.

. r v:
.,‘Iyz _ ,‘,
.AaLL‘ “[11;
.‘ ‘

or to just examine it freely. Yarbus [97] illustrates seven records of eye movements
during the exploration of one picture. In this section, ﬁrst, the types of eye movement
are explained, then, the equipment used in cognitive science studies is described and

discussed.

2.7.1 Types of Eye Movement

When closely examined, the patterns of eye movement show that there are several
types of eye movements. The structure of a human eye is such that only one small
portion of the eye has densely packed receptive cells. That area is called the fovea,
and it covers about 2 degrees of viewing angle. The area next to the fovea is called
the parafovea and covers about 5 degrees, and everything else falls into the extrafovea,
that covers about 60 degrees and is perceived blurry by a human. The area that we

can perceive in the fovea is equivalent to a word in a page at normal reading distance.

Saccades are the principal method of moving the fovea to focus on different portions
of a visual scene. There can be 250 saccades per minute, error, since 250 per
min makes sense in some cases even 900! When the visual stimuli is presented,
it takes about 100—300 msec to initiate a saccade, and 30—120 msec to complete
it (depending on the angle traversed, ranging from minimum of 1 to maximum
of 40 degrees, most typically 15—20 degrees). A saccade could be voluntary, but
once it starts, it cannot be suppressed, nor can its path be changed. During the
movement, vision is suppressed. Some studies show that the suppression is not

48

 

?ign‘ 2.4: EXEﬂUplI
a} of John .\l. H“
'11:}: l/eyelab . msu.

implore. One

3'.» «r.
as but) me. a:

L1—

 

[Image is presented in colon]

Figure 2.4: Example of eye movement during face learning and recognition. (Cour-
tesy of John M. Henderson, Michigan State University Eyetracking Laboratory,
http : //eyelab . msu . edu/)

complete. Once the fovea focuses on the object, the examination lasts about

200—600 msec, and this period is called ﬁxation.

Pursuit motion is a smoother, slower movement of the fovea. It follows a moving
object, so that it is foveated. This kind of moveirient is not voluntary, but can

be induced by introducing a moving object in the visual ﬁeld.

Nystagmus occurs as a response to moving one’s head. The fovea is slowly moved
so that the pursued object’s image is followed. If the followed object disappears
from the ﬁeld of view and appears on the other side, the fovea rapidly moves in

the opposite direction. In this way we can follow repetitive patterns.

49

2.7.2 5!

1 WT 7,. i‘
n am...

3.} .., l,
”l ‘l[” \
‘- tn... ‘
. , ‘
.V ,
Jump It
”a H ”u -
.l, , d ,
“. I; 7‘
W wt .1
1.. _
Alb;'-I.1 v
Au. “.4. .‘ ‘.
O. \
I“ Q .. V
in: IL

Ig-l‘f‘l.‘ IA
‘ V
I"
.,. . r
ha 'rl-b‘ If.
a, 1“

h, .
‘4';. fl..-) ‘
a“ 1.4
A; .,
\I'I ‘I
. F ,.
A, A . .'
-. IA‘. P‘.‘

\

. o

L. r
‘44 l‘ ‘ r. V V
‘ ‘
lug-i

i \i»

 

2.7.2 State-of-the-art Eye Movement Tracking Technology

Figure 2.4 illustrates the exploration of a sample face image. The lines indicate where
the subject was looking, and we can see clearly that most of the attention was devoted
to eyes, nose and mouth regions, while background remained unexplored. This kind
of image is typical for the visual perception and eye movement studies [38, 37].

How is this data obtained?

The state-of—the-art equipment used is called the “Purkinje Image Eyetracker” [22,
23], and is illustrated in Figure 1.1. The subject’s head is placed within a frame, the
forehead rests on the forehead rest, the bite-bar is in subject’s mouth. All this ensures
that the head stays perfectly still during an experiment. Before a session begins, a
calibration procedure is done: the subject looks at several points on the display, so
that the vectors deﬁned by the Purkinje images (see Section 2.4.2) can be properly
calibrated. It is common that the system gets uncalibrated during the session, so
that the calibration procedure must be repeated.

Numerous studies have been done using this procedure and subjects can get used
to it. However, the equipment is highly intrusive and the user has no freedom of
movement as he or she would have in a natural setting. A system that would provide
similar eye-tracking data, but in a more natural setting, would be of great beneﬁt.

The accuracy of the above eye-tracking equipment is high, about 0.75 degrees or
1 are, and the sampling rate is about 1000 Hz. The challenge of a computer vision

based system would be to achieve similar accuracy and performance rates.

50

 

Chapter

 

Face aﬂC

Th5 Cl; " ~' ’ "‘ ’
“61)ltr (1.151:
t. '
.‘J; [9763?] 7 i F
, _,.,n of gm
Q, , _ 1:.
. Hammered \" "‘
(14'

saturation a‘

m: C

-.... will" " ‘ i
he experx

F..- 2""?

+-JH K

._ S. Eilifl paw

Pew ‘
all; h

E’s like the r
r’~”->*3ﬂl‘fd. The t

V
m ‘
as. "I ‘
, It“ '-
*< 1 [\ Eb‘d‘l'
“it",

.,

If“
a I. ,-
“M d:,l[ Qx
F, l‘”‘
A¢<

Chapter 3

Face and Facial Feature Detection

This Chapter describes a method for face localization in a color image and a method
for location of facial features. For the face localization, a skin color model based
on normalized values of red and green components is used. A classiﬁcation using the
hue, saturation and intensity color space is also presented, and compared with a RGB
one. Some experimental results are presented, on both light-colored and dark-colored
subjects, and possible improvements are discussed. A novel method for locating facial
features like the eyes, eyebrows and nose based on the knowledge of face geometry is
presented. The parameters for evaluation of eyes-nose matches are introduced. The
method is evaluated on a number of test images of various subjects, both light and

dark, and results are presented.

The method works under an assumption that there is one face in the image that
does not overlap with other skin-color-like objects. If the person wears glasses or has
a beard or mustache, the method sometimes does not work well.

51

1: Y
3r. .' ;" ’
‘¢r;'lll‘l \(.| I

3.1.1 5

1' I .
L “H NIH-”CA. “
' 1
I
\ Y .'I
~Lu It”):
fv a,

\ "I“ “I .
“ﬁn‘I.-\ \f‘v.

T“‘ol

I
F“. -~
r‘dLl‘ fr
4 5‘.

, ‘
{Cl-f! -
"

'L_ .
~‘- Fr. '
‘ lj7: ‘-.
A l:.’_[:
rt“ a
‘e. \i.
~I
C
4 .~._
. ’ 'iY‘y
v.‘A“;l-"

3. 1 Face Detection

This Section describes the development of the skin color model and detection of a
face using a connected components algorithm. Problems with the skin color model

applied to dark faces are addressed.

3.1.1 Skin Color Model in RGB Color Space and Face Lo-

calization

The location of the face in an image is based on the skin color model. Human
skin color can be represented as a cluster in either the RGB (Red, Green, Blue) or
HSI (Hue, Saturation, Intensity) space. In this work, the RGB space is used, and
a discussion on the use of the HSI space is presented in Section 3.1.2. Figure 3.1
shows skin color clusters obtained from the images of 4 subjects of different skin color
(Caucasian, Asian, and Indian). The clusters are based on the 2D plot of normalized

red and green components values:

_ R _ G
Rnorm — R+G+B’ and Gnorm _ R+G+B

Figure 3.2 depicts the classiﬁcation results for the sample images in the middle
column. Yellow color represents the primary skin color, and blue and pink colors
represent the shadow areas.

The detection of face region boundaries in an input image (Figure 3.2 (a)) can
be broken into three steps. The ﬁrst step labels all the pixels in the input image
as Face, Shadow_1, Shadow_2 or Other (Figure 3.2 (b)). The second step applies
a connected components algorithm to ﬁnd the components of the ﬁrst three labeled

52

 

 

First 3.1: Skin m
6..., Out of six
man: skin color.

leg"; and shadmﬁ

:lassits. A row-bye

:‘hn

small or too bi;

e.
,l

.. aim 100. the n:

fig-xi: object of Ii.

21.
.Ll ‘ a C(Jmpar.t
l” The 0350 WI;

 

Skin Color Clusters
. . .

 

 

 

41 I
0.5 0.8 0.7 0.8

0.4
FLnorm

[Image is presented in colon]

Figure 3.1: Skin color clusters: the horizontal axis is Rm”, and the vertical axis is
Guam. Out of six clusters, three represent skin color. Cluster labeled Face is the
primary skin color, and clusters labeled Shadow_1 and Shadow_2 are areas of shaved
beard and shadows around eyes.

classes. A row-by-row algorithm is used for computational efﬁciency. Objects that are
too small or too big are discarded, thus reducing the number of objects from 5007700
to about 100, the majority of which are in the shadow classes. The third step ﬁnds the
biggest object of the class Face, and merges it with objects that border it. This step
obtains a compact face region even in the case when the subject’s face is in shadow
or in the case when the subject has a beard. In some cases, the subject’s forehead,
cheeks and beard will be separate components. Thus, merging of these components
is necessary in order to obtain the correct face region. The resulting bounding box is
shrunk around the ﬁrst moment lines using heuristics learned from processing many
examples to obtain the ﬁnal face region bounding box (Figure 3.2 (c)). The outer

53

 

(a) Original Image (b) Skin Color Classiﬁcation(c) Face Region Boundaries
[Image is presented in color] . (Pictures are part of the author’s private database.)

Figure 3.2: Isolated face regions on sample images

54

box is the? but
' ' “' r :1
m tetra...

Itis iliTW

l I
”WWW ‘1: ,
Hum; . Ml « l,

.
Mr d ‘1'“ 5:1;

'1'} , l
Pthl U11 Ill“ ‘ w,

. , .
I,‘,I?". ,1 .,~ .-
ll;.ll...: vld.‘ .‘i

3.1.2 Ski

lflé (lam (Tidal:

~ ‘5.

“£11.:LOLi5:

J
H
r-A

box is the bounding box for the original face object, and the inner box is the target
area determined from the ﬁrst moments lines.

This three-step approach has been tested on many images of many persons, in-
cluding two open-house sessions when dozens of unfamiliar people tested the program.
For a very small number of subjects, the skin color model did not work well. The
skin model worked well even for subjects with very dark skin, provided that ambient

lighting was strong.

3.1.2 Skin Color Model in HSI Color Space

The skin color model can be developed using the Hue, Saturation and Intensity (HSI)

color space. Translation from the RGB to HSI space is done using the following

equations:

12 R+G+B
S: 10_ 3min]R,G,B[

I

arccosgm—G—Bj
2\/(R—G)2+(R—B)(G—B))

Figure 3.3 shows the plot of hue vs. saturation. Three skin color clusters are
identiﬁed, similar to the clusters used in the RGB space: the primary face color
(labeled as Face) and areas of beard and shadows around eyes (labeled as Shadow-1
and Shadow-2).

Figure 3.4 compares the classiﬁcation results for the sample images using the RGB
and HSI models. Yellow color represents the primary skin color, and blue and pink
colors represent the shadow areas. As can be seen, both models are classifying skin

color with similar success. The RGB model is classifying darker skin color a bit better

55

Skin Color Model in HSV color space

Saturation

“XXX XXX X X X

 

 

 

 

 

[Image is presented in color]

Figure 3.3: Skin color clusters in HSI color space: the horizontal axis is hue and
the vertical axis is saturation. Skin color is represented with three clusters. Cluster
labeled Face is the primary skin color, and clusters labeled Shadow-1 and Shadow-2
are the areas of shaved beard and shadows around eyes.

than HSI model. Another advantage of the RGB model is that the values are available
directly from the camera interface, while for the HSI model, the values need to be

calculated for each pixel, and the computation could be time consuming.

3.1.3 Problems with Skin Color Model

The processing using the skin color model gives erroneous results in some cases. The
most common errors occur when the face object is close to some object of similar
color, especially the subject’s clothes. Another object can be recognized as a face,
or will create one component with the face. Figure 3.5 illustrates this problem. If
another object and the face are recognized as one, the algorithm has some chance of
locating the eyes and nose. However, if the unwanted object is recognized as the face

56

 

 

s.o.Lmrrzsar-llllll—-

TI? .dL Ho

"£3172 ;

i5 r

     

   

   

Q:
\" ,i'il"'.u,

(a) Original Image (b) RGB Classiﬁcation(c) HSlClassiﬁcation

 

[Image is presented in colon] , (Pictures are part of the author’s private database.)

Figure 3.4: Comparison of RGB and HSI models for skin color classiﬁcation

57

~1-

WHFf‘

uh.

'r. ‘
s

.M [r

4 ’ 1,

. r .
: ,t.[“

11.,

‘n‘
-.' l' l.
N. l‘.
- ‘t
'(1
n'
I."
~.
t-i rt
4“

    

' 1, ZIP; .
(a) Original Image (b) Skin Color Classiﬁcation(c) Face Region Boundaries

 

[Image is presented in color] , (Pictures are part of the author’s private database.)
Figure 3.5: Erroneous face regions due to similar background and face colors.
object, the real face is lost. Some control of the environment greatly reduces these

problems and maintains real-time speed.

3.1.4 Problems with Dark Skin Color

The skin color model described above works well for light colored faces. In the case
of dark skin color, if the room is well lit, the model will be able to segment the skin.
However, if the lighting is not strong enough, or if the subject’s skin is very dark,
the model will not cover that skin type. As an illustration, Figure 3.6 shows images
taken under the same settings as the other test images, where the skin color model
Classiﬁcation fails in two cases (ﬁrst two rows) and succeeds in one case (third row). As
can be seen, the ﬁrst two subjects’ skin color was classified as mainly shadow area,
thus the connected components algorithm could not ﬁnd a big enough component

58

 

1"

v.
V‘s
IJ

 

\
*.
.Ll 'r,-
wm‘r
_ h
‘ '~ 1.
.‘r

.nt'mfit v ,

 

(a) Original Image (b) Skin Color Classiﬁcationtc) Face Region Boundaries
[Image is presented in color] , (Pictures are part of the author’s private database.)

Figure 3.6: Localization of dark skin color subjects: two failed and one successful]
classiﬁcation

of the face color. Big connected components of shadow class are discarded by the
program as possible background. In the third, successful case, the combination of the

primary skin color and one shadow class enabled successful face localization.

3.2 Improvement of Dark Skin Color Detection

The main problem in detecting dark skin color is that the skin color is not well
clustered in Rm”, x Guam space. The skin points are mainly spread across the area

59

Ill tlusre lam-.1

[at :uu‘..~zue;~ l1

bl

,. .‘
ig‘irilalc llli' ll.‘

Is fall it m

It]. [ill OI alum-

I>a
‘p.$0
\. .

.....ei In law I

”77‘

wig" lll Sllt‘ll a

-v-.

lie Ina“

I Y” I . '
.qqu‘I‘ﬁ‘

. ‘ \
f‘
I.‘
5‘” ...
é‘ai ‘ x:
.J-
‘l
0., ‘ I).
. ~‘_,l
.1
«1:; "Tim"
‘-.ﬁ.'i ., ‘
r1 ‘
TE I
f»-
if '1 ‘
Ne 24(1‘:;:

L‘ .l ‘
I it- in
“ii""
.. I,
g“ :r.‘
,gqr- ‘
1i (ill,
’1'.

of cluster labeled “Other_2” in Figure 3.1. However, histograms of red, green and
blue intensities have the same shape for both dark and light skin tones. Figure 3.7

illustrates the histograms for ﬁve windows.

As can be seen, the peaks of the histograms for the light skin tone are around 80,
70, 60 (or above) for red, green and blue, respectively. If the dark skin histogram is
shifted to have the peak at the same value as the light skin one, we can transform the
image in such a way that the skin color then ﬁts into the proposed skin color model.

The transformation function used is the following:

alc * Im,g(i,j, c), if Ion-g(i,j, c) < histpeakc;

[alteredayja C) :
a2C * Ion-9(i, j, c) + b2c, elsewhere,

where, I (i, j, c) is image intensity at row i, column j, and of color c E {R,G,B},

hist_peakC is determined from the corresponding color histogram, and

a1 = constc
c histmeakc
a2 _ 255—const
c _ —————£—

255—hist.pealcC

__ constC—histeakc
””26 — 255 255-hist_peakc ’

and constR : 80, constG = 70 and constB = 60.

The darkness of each window can be checked automatically, by comparing the

hist_peakc value with boundary values for R, G, B, that are empirically found to be

1

3C0nStR, gconstg and constB, respectively. Then, the window can be considered to

be a dark skin area if at least the three leftmost or rightmost windows are dark.

If the above formulae are applied to the ﬁrst two images from Figure 3.6 (a), we
get the images where the skin is easily classiﬁed. Resulting altered input images and
the skin color classiﬁcation results are shown in Figure 3.8.

60

 

W1 W3

 

W4

 

 

 

 

 

 

W0 W2

 

 

(a) Windows labeling:

 

:ﬁsiﬁa 911.11- $.11]
hr; .5: .... = .... i. ._

 

(c) Red, green and blue intensities histograms:
Black line is dark-light skin boundary, dark red, green. blue bars are peak values

[Image is presented in color] , (Pictures are part of the author’s private database.)

Figure 3.7: Red, green and blue intensity histograms for dark and light skin tones

61

 

Skin color classiﬁcation of altered images

[Image is presented in color] , (Pictures are part of the author’s private database.)

Figure 3.8: Color-altered dark images, and resulting skin-color classiﬁcation

3.3 Eyes, Eyebrows and Nose Location

Once the face target area is found, the search for the eyes, eyebrows and nose is limited
to its bounding box. A greyscale image is used for this, speciﬁcally, the smoothed red
component of the input color image. Finding the eyes, eyebrows and nose is based
on the knowledge of the face geometry. In this Section, the algorithm and heuristics

used are described in detail.

3.3.1 Detecting Eyes

The eye pupils are the darkest objects on a typical human face, regardless of the
eye Color. If the face and eyes are brighter, the pupils will be brighter, but still the

62

darts! Ul’t‘liill.
we pupils will l1:
figure 3.9. hrs:
tzifllpbllt‘lll.‘ of a

hlht‘ ml l Ilii‘lt' -

 

finding the t'.

p‘ . k , } [I i.
to min-1r. lure ._

 

pruhle—m is that :h'
implied until Iv.
tithe red. grail :
an. in the red in
[inherent they i.

I‘. - .~ .‘ n.”
air at nest: l.»-

7”,”,
Li 3‘
.

.

l _,
,. area. and v.

5h:
5 in!
ii'iil'iplii t r
‘ F J! "-'T I
',l; [E

darkest overall. The red component of a color image absorbs the most light, thus the
eye pupils will be darkest, compared to the values of the green or blue component.
Figure 3.9, ﬁrst column, illustrates the difference between the red, green and blue
components of a color image. Since our skin is of reddish color, skin is rather bright

in the red image compared to the green one.

Finding the eye blobs can be reduced to ﬁnding the two darkest objects in the
face image. Thresholding at an appropriate value would produce two eye blobs. The
problem is that the threshold is not the same for all eyes. Thus, gradual thresholding
is applied until two dark blobs are found. Figure 3.9 shows results of thresholding
of the red, green and blue component at gray levels 1, 5, 10, 15, ..., 35, 40. As can be
seen, in the red component image the pupils are easily identiﬁed, while in the green
component they become visible at a higher threshol, and in the blue component they
are not visible below the threshold level of 40. Additionally, since human eyes are
often of bluish or greenish color, these components would have high intensity in the

pupil area, and would not be the darkest area in a face image.

For each thresholded image, a connected components algorithm is run to ﬁnd dark
blobs. Blobs that are at the edges of the target area, too small, too big, or sparsely
ﬁlled are rejected. Figure 3.10 depicts the results of gradual thresholding for a sample
image. The threshold is increased from 0 until a successful detection of eyes and nose
is found. For faces with dark eyes, smaller thresholds result. For faces with bright
eye pupils or for very bright images, higher thresholds are needed to get the eye blobs
to be below threshold.

63

 

l
lilti"
II‘ ”it".

an

.3
-3

30

f'

P]:
.F‘
. 3 ’
.1); I

‘2

I

t

y
c
4 t
.Qr
{at
ff.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Grayscale and thresholded images
Threshold Red Green Blue Red Green Blue
1 .
5 .
10 5 ~ l l
15 K A a i:
20 i l 3. I'
25 l d . .. is I
30 l d I l n J L I ‘ 3.
F ' ' ' ' ” ‘ t
35 n ._ I k - I n A L ‘ I ' i- I I
y ‘ . ’— :-- .. -v.
3 .. I - -. .. u‘ .. a”
40 34.1 |.;.‘ a I 5 ul ' 5t I

 

(Pictures are part of the author’s private database.)

Figure 3.9: Red, green and blue components of two images: gradual thresholding in
search for eye blobs

64

 

 

W1:

F " p‘o
:
L...
F" '70.
.
...i‘,r_

I
\
'1‘ Cl;
'7 J Y
n [$6.
in 1,
" A- ._
‘ ‘1'. A:
't:
’~.‘: ”‘1’,
.. .1.
' 3-
’: "l ‘7‘].
i )n 1."

 

at“;
“U [‘r
. r.
‘1 ,l ‘9
use] 4'44“
,4 t p... "v
A. -«.“[ -.v
I 4

 

Low threshold High threshold

    

 

 

 

(Pictures are part of the author’s private database.)
Figure 3.10: Gradual thresholding in search for eye pupils
When only one image is processed and there is no prior knowledge of the suitable
threshold, it is necessary to try thresholds starting from the lowest value up to the
value that yields a satisfactory match. However, in the case of continuous processing,
prior knowledge of the threshold for the previous frame can be used. Thus, the
algorithm can adapt itself to the current subject’s face and to the current lighting

conditions.

3.3.2 Matching Eyes with Eyebrows and Nose

As can be seen in Figure 3.11, at the best threshold we typically do not get only
two dark blobs, but often more than two. For each blob that is a candidate for an
eye, it is first checked whether an eyebrow is present above the eye. The existence of
the eyebrow is veriﬁed by looking for an edge in the average threshold image that is
10—30 pixels away from the eye. The blobs that do not have the edge representing an
eyebrow are discarded and not considered candidates for the eyes.

All the remaining blobs are assumed to be candidates for the eyes, and are matched
to ﬁnd the nose. The image that is used to ﬁnd the nose is the red component,
intensity-thresholded at the average intensity. When ﬁnding the nose, it is assumed

65

 

(a) Original image (1)) Best thresholding (c) Nose line ((1) Feature points
(Pictures are part of the author’s private database.)

Figure 3.11: Matching eye blobs with nose

66

that most of the lighting on the face is from above. It is also assumed that the nose
line is perpendicular to the line between the eye pupils, and that it intersects the line
between the eye pupils roughly at the midpoint of that line. The precise location of
the beginning of the nose line is the mid-point of the widest above-threshold region on
the eye line. The estimated nose width is equal to the width of that above-threshold
region. The beginning point. of the nose line is on the eye pupils line and the nose
line ends at the tip of the nose. Then, the nose line is pursued downwards, using
the estimated width, until a signiﬁcant below-threshold area is found. That point is
considered the tip of the nose. The shaded areas in the Figure 3.11(c) represent the
traced nose line. Each eyes-nose match is evaluated based on several heuristics. The
match with the highest score is selected as the eyes-nose match.

Scoring rules are based on the comparison with the current face region dimensions.
Thus, the user’s face does not have to be at a ﬁxed distance from the camera, since the
parameters that are used for match scoring will automatically get re-adjusted when
the user’s face moves. The only constants used are the approximate relationships for

eyes and nose distances, and they were determined empirically.

Each match is evaluated using the following rules:

1. If the Euclidean distance between the eyes is smaller than 25% of the target

area width, or greater than 70% of the target area width, the match is rejected.

2. If the Euclidean distance between the eyes differs more and 9% of the expected

eye difference, the match is rejected.

67

3. Add 100 - (1 — abs(§§ftXV:—I——3§§:—:§)) to punish if the slope is too big. It is assumed

that the subject will not turn the head for more than 45 degrees. If the slope

is more than 45 degrees, the match is automatically rejected.

4. If one of the eyebrows has not been found, but estimated, the whole match

value is multiplied by 0.9.

Add percentage of fullness of the forehead line with respect to the skin clustered

points. The forehead line is the line between two eyebrows.

 

. _ abs(eyebrow1dm—eyebrow2d,5t) - , , ,
Add 100 (1 max(eyebmwldm,eyebmwz‘w)) to reward symetric eyebrows. eyebrow d,“

is deﬁned as the shortest distance between eyebrow and eyeline, and is calculated

as the number of points from eyebrow to eyeline along the eyeline normal.
5. If calculatedmoseridth < min_nose_width, subtract 10 to punish the absence
of above-threshold area on eyeline, and min_nose_width : 0.1eye_distance.

calculated_nose_width )
perfect _nose_width

 

6. If calculated_nose_width > max_nose_width, add 30- (3 —
to punish wide above-threshold area on eyeline, and perfect_nose_width =

0.3eye_distance, and max.nose_width = 0.5eye_distance.

 

_ num..black_points
7. Add 100 (1 totaLl_p0ims_0nJoseﬁna) to reward above-threshold areas on the nose—

line.

Add one half of the average gray value on the nose line, to reward bright areas,

and punish dark areas.

The problems with the above algorithm occurred when the lighting conditions

were such that one side of the face was in shadow and the other side was directly

68

l'
lu

 

A...

L:!

m";

y.
unu-

0,7

‘19
A

 

L,
1‘7‘ ‘5'
.1“
.
'r
.

illuminated. In such cases, both eyes could not be found using the same threshold.
The solution for this problem would be that the subject does not work under such
extreme conditions. It is likely that any subject who is working in front of a worksta-
tion would not like to be directly illuminated, since a lot of reﬂection is present and

the screen cannot be seen clearly.

Another solution to this problem is to remember which blobs are growing in size,
as the threshold increases, and to use their locations in a higher threshold. In that

way, we can compensate for the disproportional lighting on the faces sides.

Figure 3.12 shows examples of the feature detection for various pan, tilt and roll
angle rotations, some of which are extreme. The program imposes a limitation of up
to 45 degrees of roll angle. The eye features are lost if the head is panned completely

to the left or right. As roll and pan angles increase, so does the error in detecting the

tip of the nose.

3.4 Accuracy of Feature Detection

The accuracy of the feature point detection was tested on four subjects (two with
light skin color and two with dark skin color). The continuous stream of images was
recorded while the subject was moving, e.g., rotating the head or moving toward
or away from the camera. Two sets of images were recorded: one with a normal
background (e.g., white walls, workstations in the background, etc), and another one
with a plain green background. Figure 3.13 illustrates the input images used.

69

 

(Pictures are part of the author’s private database.)

Figure 3.12: Results of eyes-nose detection for various roll, pan and tilt angles. The
white rectangle shows face boundaries, and the white dots show the locations of the
eyes, eyebrows and tip of the nose.

 

[Image is presented in color.) , (Pictures are part of the author’s private database.)

Figure 3.13: Sample images used for accuracy of feature detection calculation

7O

 

 

Light subjects Dark subjects
Background Background
Norm. ] Plain Norm. [ Plain

Matches(%) 93.00 99.00 58.33 80.00
Outliers(%) 5.00 0.00 5.00 1.67
Not found(%) 2.00 1.00 36.67 18.33

(a) Percentage of matches found, matches not found, and outliers.

Light subjects Dark subjects
Background Background
Norm. | Plain Norm. 1 Plain

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Left Eye 1.04 1.09 1.59 1.59
Right Eye 1.52 1.52 2.78 1.58
Nose 4.05 3.77 3.53 2.63

 

 

 

 

(b) Average Euclidean distance for matches in pixels.

Table 3.1: Accuracy of the feature location. Percentage of found matches and average
Euclidean distance of program-found features from hand-labeled location. Compara-
tive results for normal environment vs. controlled (plain green) background are given.

Then, a random sample of (50 or 30) video frames was selected (from 300 or
150 frames). The eyes and nose locations were hand-labeled and the Euclidean dis-
tance between the labeled features and the program-detected features was calculated.
Points that had Euclidean distance of more than 10 were considered outliers.

Table 3.1 shows the average Euclidean distance in the set of images when the
correct match was found. Also shown: the percentage of correct matches, percentage
of outliers and percentage of times when the match was not found.

For the normal background, matches were found for 58—93% of the frames; for
the plain green background the success was 80—99%. The percentage of outliers for
normal background was 5%, while the percentage of outliers for plain background

was 0—1.67%. For the dark-skin subjects, the percentage of missed matches was 36%
with the normal background, and 20% with the plain green background. Clearly,

71

the plain green background enhanced the performance. The error measured as the
average Euclidean distance between the hand-labeled and the program generated
feature points was 1—4 pixels. The error for the eyes was very small (1—2 pixel off),
while the error for the tip of the nose was larger (2—4 pixels). This discrepancy is
due to the lighting conditions and our assumption that the line between the eyes and
down along the nose make a 900 angle. When the face is rotated signiﬁcantly the
angle is smaller than 90". However, the error made is still small, and the amount
of computation is reduced since only one angle is considered. The eye blobs of the
dark-skin subjects were found at very low thresholds (0 or 1 in most cases), and
the threshold increment had to be modiﬁed to 1 instead of 5, which was used for
light-skin subjects. This problem can be overcome with increased ambient lighting or
more gradual increments in the thresholding based on the knowledge that a dark-skin

subject is using the system.

3.5 Summary

In this Chapter, a method for face localization in a color image was presented. The
method is based on a skin color model that covers a wide range of faces, from light to
dark ones. Both models using RGB and HSI color space are presented and compared.
Some problems with the skin color model due to background similar to the face color
were presented, as well as problems in the detection of dark skin. A method of
color alteration for improving dark skin detection was presented. A novel method for
locating facial features based on the knowledge of the face geometry was presented.

72

The method ﬁnds the eyes, eyebrows and nose of a subject. The evaluation of the
accuracy of the feature detection showed 1—2 pixel accuracy for detecing the eyes and

2—4 pixel accuracy for detecting the tip of the nose.

73

J

,_.
{7‘
. T _

.‘v...

 

 

Chapter 4

Tracking the Features

In Chapter 3, the algorithm for face location and facial feature location was described.
The algorithm is designed to work on individual images. In this Chapter, it is de-
scribed how the algorithm from Chapter 3 has been integrated in the processing of
video stream data. The input to the base program is a live video stream from the
camera attached at the t0p of the workstation monitor, and the output is the list of
eyes—eyebrows-nose coordinates. The frames are processed in sequence. To take ad-
vantage of movement history, a Kalman ﬁlter is used to estimate the motion vectors
of the tracked feature points and future position of tracked feature points [43]. The
state diagram of the system is presented that deﬁnes how the individual steps of the
algorithm are combined to achieve real-time processing rates. Time-space diagrams
representing the feature coordinates in time are analyzed. Finally, numerical data are
presented that show that the system can run at real-time rates with a good accuracy.

74

 

l'"

.l..-

17

A“

a,”

3.37:

‘V‘ t
‘1
1.37

 

V
‘ ‘7
“ will:
‘I 1

r

6 I

f‘. l

w k,

”1» ,v 7‘
'«u g

‘r ~

4.1 Tracking System State Diagram

The system state diagram is depicted in Figure 4.1 (a). The tracking starts in state
NO_FD (no face detected). Once the face object has been located (state FD_FIRST)
in the input video stream, feature points are tracked using the approach described
in Chapter 3 (state FD_TRACK). First, an attempt is made to ﬁnd tracked features
in the location of the previous frame. If that does not succeed, a new face position
is determined. Thus, time is saved by not unnecessarily running the face detection
algorithm, which is the most time-consuming segment. Since the subject’s motion is
typically smooth, once the face region has been locked, the tracked features can be

located in the same face bounding box for a number of video frames.

If tracked features cannot be found, a prediction based on the Kalman ﬁlter is used
to ﬁll in the gap (state FD_PREDICT). The prediction is done for a limited number of
frames, namely at most three. If the tracked features are found while in the prediction
state, a switch is made to the recovering state FD_RECOVER in which only measurements
are taken from the environment and no prediction is made. The system stays in the
recovering state as long as it was in the prediction state. After the recovery period,
if tracked features have been found, a switch is made to the tracking state. If the
tracked features have not been found while in the prediction or recovering state, the
face ﬁnding algorithm is started again to ﬁnd a new face location (state FDJEWPOS),
since the assumption is that the state of the environment has changed too much
during the prediction period. If the face has not been found, a switch is made to the
initial state, whereas if the face has been found, a switch is made to the intermediate

75

match
A
-m

FD
grab.match
no match

match

   
   
 
     

  
  
    

    
   
   
   
 
     
   
  
 
 
  

   
   

no match
match (predict)
(recovered) -
match FDJREDICT
(not grab.match
face recovered) match

 

A.

FDJIECOVER
grab.match

  
  
 

no match
(do not
predict)

 
 

no face

(a) Original State Diagram

match
PDJ'IUCK
grab,match
no match

face. no _match
match (predict)
(recovered) ‘

FDJ’REDICT
grab,face,rnatch

face, match

   

match
(not
recovered)

 

  
   
  

face

 
  

1‘
rnncom
grab,match

   
   

 
 

no face,
or

face, no match
(do not predict)

(b) Improved State Diagram

Figure 4.1: Tracking System State Diagram

76

continuation state FD_CONT. In this state, an attempt is made to ﬁnd the tracked
features in the new location. If the features have been found, a switch is made to the
recover state. If the features have not been found, a switch is made to the no-match
state FD_NOMATCH. In this state, the face object is located in the image, however, no

face features are found.

If the face starts to move, the diagram in Figure 4.1 (a) will not respond promptly
to the movement, and the tracking parameters will not be adjusted well. Thus, an
improved scheme, depicted in Figure 4.1 (b) is introduced. The only difference is that
in the state FD_PREDICT, an attempt is made to ﬁnd the face ﬁrst, then to try to ﬁnd
the features in the new locations. In this way, if the face moves a bit, the new face
boundaries will be found that will ﬁt the data better, and the tracking parameters

will be adjusted faster.

The motion of six variables is estimated: the X and Y coordinates of both eyes
and nose. In the Kalman ﬁlter equations, using previous and current values of the
variables, we manipulate 2 x 2 matrices, and matrix operations such as the inversion
are not time consuming. The time update (“predict”) equations for the projected
state of tracked point 2?; +1 and the error covariance matrix P;+1 are:

2 —1 f

if“: Akik= k1
1 0 f‘

P;+1 = AkpkAf + Q,

where fk is the estimate at time k, and f k“ is the estimat at time k — 1. In our case,
53k is the X or Y coordinate of the tracked point.

77

Tilt

~~p1”‘, 1
ﬁllia:fi\

O
“‘K'

K. 37,7,

}.

\(‘vr

“dd .’
.C

The measurement update equations for the Kalman gain Kk, the update of the
estimate with the measurement 2),, and the update of error covariance matrix are [43]:
K), = PUP; + R,,)'1
X), = x; + KHZ)c — Hkxg)

Pk :— (I —— Kk)P;

where we set Hk 2: I, and empirically determine the measurement error covariance
matrix R), and the process noise Qk.

The predicted coordinates are used in two ways: (i) to verify the position of newly
found features and (ii) if the tracked features are lost to predict their location and thus
smooth out the tracking and avoid losing the tracked features. In the former case,
each match is checked against the predicted location. The simple Euclidean distance
is calculated for each of the coordinates, and if the overall distance is too large, that
match is discarded. It is assumed that such a match is rather far away from the real
match, e.g., the eyes and ears might be matched. In the later case, the predicted
coordinates are used as the output of the program, instead of the program-calculated

ones, as described earlier.

4.2 Using Motion Detection to Improve Tracking

Results

In some cases when the user’s clothes or background are of color similar to the user’s
skin color (e.g., reddish, pinkish, orange colors), the segmentation of the face vs.
background is not very good. To overcome that problem, we use motion detection

78

|\

F).

\i..~4
r o .

CF’

to facilitate face-background segmentation. The area where there is some motion
is likely to contain the user’s face only, and the areas where there is no motion is
assumed to be the background. We need to focus the search for the face only in
the area where there is motion, and the rest of the viewing ﬁeld is ignored. Thus, if
the background includes the colors that are similar to the skin color, they would not
affect the search for the face.

The motion detection algorithm used is rather simple: for each frame at time t,
the difference between the frames 75 and t — 1 is calculated. All pixels are ﬂagged
as changed, if the pixel value difference is larger than 10, or same otherwise. Every
30 frames, a check is made to determine which pixels have changed in the previous
interval, and the motion area is calculated, by ﬁnding connected components out
of the moving pixels. Components that are out of range of the previously detected
motion object are discarded, to reduce the noise. At times of signiﬁcant motion, the
motion area is updated at the global level, and when there is just slight or no motion,
the previously found global level motion is used.

Once we have determined the area that is moving, we restrict search for the face to
that area. In this way, we can speed up the face detection algorithm, and compensate
for the time spent in calculating the motion. Real-time processing rates are still
achieved.

By detecting the motion area, we can also easily check whether a dark face is
present using the algorithm described in Section 3.2. We need only check the windows
that are centered around the motion area center. If we didn’t have the information
about the possible face area, we would need to check all possible sub-windows for

79

 

 

 

 

Euclidean distance No motion detection Motion & dark face
from hand-labeled no KF ] use KF no KF L use KF
Eye 1 2.84 3.31 1.75 2.29
Eye 2 8.77 8.83 4.46 4.25
Nose 5.11 5.47 8.17 8.88

# Matches 102 117 130 116

# Outliers 6 27 11 50

# Misses 227 191 194 160

# No Matches 25 25 25 25

# Detected No Matches 24 25 25 19

 

 

 

 

 

 

Table 4.1: Accuracy of the motion detection and dark face detection given as the
Euclidean distance between feature points and hand-labeled locations. Euclidean
distance is given in pixel units, and the number of matches is in frame units.

dark areas, or to restrict the face location to some speciﬁc image area (e.g., center of

the image). The latter approach would signiﬁcantly constrain the users.

The above approach was tested on a 359-frame movie of a subject with dark-skin.
Table 4.1 compares the accuracy of the detection when motion detection is and is not
used, and when the Kalman ﬁlter is and is not used. The Euclidean distance between
program-detected and hand-labeled feature points in each image frame is calculated
in pixel units. The table also includes the number of video frames processed: the
number of correct eyes-nose matches, number of outliers, number of frames where no
eyes-nose match was found, number of no-matches in the video stream, and number
of detected no-matches in the video stream. As can be seen in the table, the number
of outliers increased, while the number of missed matches decreased when the motion
and dark-face detection were turned on. Also, the accuracy of ﬁnding both eyes
increased. However, the nose location detection error increased. This is due to the
altering of the input image, and in that process some information about the precise

80

uA‘

:‘i'lﬂ
5.1
‘ul ..1

nose location was lost. The use‘of the Kalman ﬁlter resulted in fewer no—matches and
more outlier matches, in both the case of motion detection and no motion detection.
The increased number of outlier matches is due to the fact that the use of the Kalman
ﬁlter prediction might reject a correct match if it is locked onto an incorrect match.

The most important issue is that the face area was detected more accurately in
the case when motion detection and image alteration was used, which leaves room

for improvement of the feature ﬁnding algorithm.

4.3 Results Using Kalman Filter

Figure 4.2 shows results of the feature tracking on a sample movie ﬁle: the X and Y
coordinates of eyes and nose points are plotted over time. We compare the results
of the tracking without smoothing and with smoothing and prediction based on the
Kalman ﬁlter. As can be seen, the number of peaks in the case when the smoothing
is applied is signiﬁcantly reduced compared to the case when the smoothing is not
used.

The accuracy of the feature point detection is measured as the Euclidean
distance in pixels between program-detected and hand-labeled coordinates. In our
experiments, the camera had square pixels so the X and Y scales are the same.
Table 4.2 (a) shows the accuracy measured on three movies (two light-skin and one
dark-skin subjects, with the total length of 835 frames). The program detection was
typically 1—3 pixels away from the real locations. The location of eyes was 1—3 pixels
away, while the location of nose was 2—3 pixels away from the hand-labeled location.

81

 

..huvun,

aleulvtil,

:47

, .
‘Jlu‘ﬂ,
~.

X coord in time - Results on a sample movie file mv1 - No smoothing Y coord. in time - Results on a sample movie file mv1 ~ No smoothing

 

‘v v 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

E a . éyei X Y -— 3 Y . Eyel Y -——
3 Eye2 X EyeZ y a... .._
8 20° Lw..- 4 _— Test-"“3- ”Wily _Nosex ‘ Fl" ‘ g 150 “II" Nose‘Y-
V ‘,_ ' I, ,9 ! I . . . , ‘ —.. ’.
31ml 75 ’L' 1 U 110 » . v ,. 7 ' .
x 50 ‘ * c 90 * * * . ‘ * ‘ ‘
O 50 100 150 200 250 300 350 400 450 0 50 100 150 200 250 300 350 400 450
Frame Frame
X coord. in time - Resmts on a sample movie hie mv1 - Apply smoothing Y coord. in time - Results on a sample movue tile mv1 - Apply smoothing
E , F I éyel X I i I I ‘ - Y éyel Y 7
3 Eye?. X EyeZ y -_-__
g 200 ________.__ _____ ’f“":'_"_~.\ Nosex - r” ‘ g 150 L NoseY.
g 150 WM 4i:/\ E 130 WA 1.12% [A ”JL
0 ~- . ,
U 100 ir 5:2,} > 110 L a... . ._ -. . .—
x 50 I I 90 . . 1 . A
O 50 100 150 200 250 300 350 400 450 O 50 100 150 200 250 300 350 400 450
Frame Frame

Figure 4.2: Comparison of the X and Y coordinate tracking without (top) and with
(bottom) the use of Kalman ﬁlter for smoothing and prediction.

These results were obtained after the outlier matches were removed from the statistics.
Table 4.2 (b) shows for each subject the number of correct eyes-nose matches, number
of outliers, and number of frames where no eyes-nose match was found. The errors
are similar in both cases. However, the number of outlier and missed matches are
lower if smoothing is applied. This ensures smooth changes of the feature coordinates
in time, which is important for gaze direction and menu selection, as will be discussed

in Section 5.2 and Chapter 7.

The execution time of the program is such that real-time tracking rates are
achieved. An image size of 320 x 240 allows the user to move freely in front of the
camera and to sit far from the camera (and display). In the current set-up, if the
subject sits about 506m away from the camera/display, the subject’s face is about
64 x 64 pixels, which allows reasonable matching results. For the above image size,
we can achieve a frame rate of 10—30 Hz on a SGI Indy 2 workstation.

82

 

 

No smoothing Apply smoothing
Eye 1 Eye 2 Nose Eye 1 Eye 2 Nose
SI 1.44 1.20 3.14 1.60 1.38 3.54
S2 1.30 3.34 3.21 1.18 3.11 3.22
S3 1.21 1.23 2.33 1.29 1.34 2.19

 

 

(a) Euclidean distance for the eyes and nose in pixels between hand-labeled eyes
and nose locations, and computed locations.

 

 

No smoothing Apply smoothing
Matches Outliers Not Found Matches Outliers Not Found
S1 240 3 1 242 0 2
S2 332 0 7 334 2 3
S3 252 0 0 251 0 1

 

 

 

(b) Number of video frames processed: number of correct eyes-nose matches,
number of outliers, and number of frames where no eyes-nose match was found.

Table 4.2: Accuracy of the tracked feature location

Table 4.3 shows the execution times for six sample runs of two subjects. Three
different motion patterns were tested: no signiﬁcant motion, smooth motion and
sudden movement (subject moving extremely fast). All measurements were done for
1000 frames of video on an SGI Indy 2 workstation with the input image size of
320 x 240 pixels. The ﬁrst row shows the total number of matches, and the second
row shows the number of frames spent in the tracking state. The execution time of
the face detection algorithm (row FD_NEWPOS) is 80—90 msec, while the time needed
to locate the tracked features in the face (row FD_TRACK or FD_CONT) is 10—-30 msec.
The overhead induced by grabbing and displaying the image is not signiﬁcant, and
the overall frame rate that can be achieved for smooth motion is 20—30 Hz. In the
Case of sudden movements, the subject’s face position changes in every frame, and
the face detection algorithm dominates the execution times, so that the frame rate

83

1&3

y
1.7"

“lg!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

No Smooth Sudden

Motion Motion Moves
R1 R2 R3 R4 R5 R6
Match 999 1000 998 999 958 799
FD_TRACK 992 998 941 924 623 431

Average Execution Times in milliseconds
FD_TRACK 14 12 29 21 54 96
FD-NEWPOS 94 125 89 90 90 86
FD_CONT 24 16 37 15 60 53
Analysis 15 12 37 29 111 175
Grab 12 15 5 8 11 10
Display 6 5 7 5 6 5
Total 34 33 49 42 1'29 191
Frame Rates in Hz

Analysis * 69 81 27 35 9 6
Overall ' 30 30 i 20 24 l 8 5

Table 4.3: Execution times for six runs (two subjects). The program was run for 1000
frames of video, on an SCI Indy 2 workstation with the input image size 320 x 240.

achieved is low. If we used smaller images, the frame rate would be higher; however,
that would signiﬁcantly constrain the subject’s movements, if we are to track eyes

and nose accurately.

Similar execution times were achieved on a 200 MHz Pentium PC with a Matrox
video board, running Windows NT and using Vision SDK as the interface for the
camera. In this case, the major bottleneck was the Vision SDK interface, while the
processing itself took the same time as on the SGI workstation. Thus, the maximum

frame rate achievable was 10 Hz.

The tracking program was tested on numerous occasions, including several open-
house sessions. Both light-, brown- and dark-skin subjects were tracked with good
results. In the case of dark—skin subjects, in some occasions extra lighting was used to

84

increase the ambient lighting of the room. When measured formally using recorded
movies, the subject’s face and face features were successfully tracked in the video
stream for 99% of the frames. We did, however, have some difﬁculty with the system

performance in another room in another location during a demo session.

4.3.1 How far from the camera can the user be?

In majority of the experiments, the users were sitting at about 2 ft distance from the
camera, which was located on top of the workstation monitor. In the early stages of
the development, we used a low-quality generic SGI eye-camera for our experiments.
In the later stages of the development, we used a high—quality pan-tilt-zoom camera.
The tracking program worked well using both cameras. The users were not required to
be at a ﬁxed sitting distance from the camera during experiments. They could move
closer or further, and the facial features were tracked without problems. Our facial
features ﬁnding heuristics do not require any user-speciﬁc information. As long as
the facial features are spaced enough for the described geometry to work, the features
could be located.

In an experiment using a pan-tilt-zoom camera, we measured that the smallest
face area in which facial features could be determined was 50 x 70 pixels (using a
320 x 240 pixel image). In such a case, the distance between the eyes translated into
20 pixels, and eyes-nose distance was 10 pixels. Similar results were achieved using
a generic SGI eye-camera. If the zoom feature of the camera is used, the maximum
face distance from the camera depends on the zoom factor. In our experiment, at the

85

 

;,,

ljv“

“l’k‘rw
. “A
.‘1

F 7‘

maximum zoom factor, the user was 7 yards (6.5 meters) away from the camera and
the facial features were tracked by our algorithm. All the above experiments were
conducted in a controlled environment: a room with fluorescent ceiling lighting and

no windows. No additional light source has been used.

4.4 Analysis of Movement Data

What can the time-space statistics tell us about the subject’s movement? Figure 4.3
shows the X and Y coordinates in time for two sample movie ﬁles. Figure 4.3(a)
shows a sequence when the subject was ﬁrst still, then moved the head up and down
three times, then moved head left-right six times. The last graph shows the X and
Y image coordinates, and we can clearly see the motion indicated by the plots of the
coordinates in time. In Figure 4.3(b), we see a sequence when the subject was moving
the head left and right, and down, which indicates spiral moves. The X vs. Y plot

shows the spiral moves for each tracked feature point.

The plots from Figure 4.3 resemble the plots from Yarbus [97], and we can analyze
them in a similar way. Additionally, the data could be used to monitor the user’s
movements, and certain movement could trigger certain actions, as will be discussed
in Chapter 7.

86

i l .A:

has

.

rt

5 .

l4

X coord. in time - Results on a sample rnovue file mv3 Y coord. in time - Results on a sample movie file mv6

:1 tm/hw:_ 7
- m—ﬁ/‘Lf’p’xrhv'r‘j’i‘ sir-ward" l‘, at“.
_ ..-... ”W” k , _. ,. ..

:5“ ' ~ ,

x ... --— u’l‘ ‘ \_ - . W _
WW MAD/x» i 1
0 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250 300
Frame Frame

 

 

eye1 ——
eyez . ...,.

 
  

   

 

 

X Image coord. (column)
8
Y image coord. (row)

Y coord. in time - Results on a sample movie ﬁle mv3 X coord. in time - Results on a sample movie tile mv6

 

 

240 e 61 —
22 y —

 

 

 

01002003004005006007008009001000
Frame

 

 

Y Image coord (row)
8 8 8
l
l
l
, <5
'3
h 3
8
O
X image coord (column)

Eye and nose X and Y image coord. in time. movie file mid

 

 

 

Y image coord.

v image coord.
8

 

 

 

 

 

 

so 70 7 "‘ .
110 120 130 140 150 180 170 180 190 200 210 120 130 140 150 160 170 180 190 200 210
X image coord. X image coord.

(a) Movie ﬁle mv3 (b) Movie ﬁle mv6

Figure 4.3: Sample time-space statistics for two movie ﬁles: X and Y coordinates of
feature points in time, and X vs. Y image coordinate

87

4.5 Summary

This Chapter presented an integration of the feature ﬁnding algorithm into a real-
time tracking application. The system state diagram was deﬁned, and the use of
Kalman ﬁltering was described. Kalman ﬁlter estimate is used in conjunction with
the feature ﬁnding algorithm to eliminate erroneous matches. In addition, the ﬁlter
estimate is used as the output of the system when the features could not be found by
the algorithm, which resulted in smoother tracking. The use of a motion detection
algorithm was presented. Motion detection was used to overcome the problems when
the face-background segmentation and to facilitate dark face location. Accuracy
results and execution time data were presented. Finally, plots of the tracked feature
coordinates in time were presented and analyzed. An analogy to the eye-movement

pattern analysis in Cognitive Science was made.

88

Chapter 5

Gaze Direction Detection

This Chapter presents a method for detecting the users gaze direction based on the
coordinates of the eyes and nose in the image. A neural network used to map the
user’s facial feature location to screen coordinates is described, as well as a method
used for its training. The mapping of the motion of features in the image to the
motion of the screen cursor is described. The joint use of the above two mappings is
discussed. Methods to smooth out the gaze direction are presented. Finally, results
of the gaze tracking are presented in a form suitable for analysis of eye-movement

patterns, e.g., in visual cognition studies, in attention measurement, or in the cursor

control by the gaze.

5.1 Introduction and Terminology

Once the coordinates of three points on the face are known, we can determine the sub-
ject’s gaze direction. Several methods for gaze direction determination are discussed

89

in Section 2.4, that range from the exact mathematical solutions to an approximation
using Artiﬁcial Neural Networks. In this work, the main goal is to avoid the calibra-
tion of the system, and to avoid the explicit knowledge of the user’s dimensions or
camera characteristics. This would enable high transparency and portability of the

system.

Currently, the program tracks only the user’s head and eyes and nose position
within the face, unlike systems that track the user’s eye movements. The head pose
is determined by the position of the eyes and nose points in 2D, and the user must
turn the whole head for a gaze point to change in our system. In the eye-only-tracking

systems, the user’s head must be stationary, and only the eyes would move.

Thus, in our work we are not able to track the saccadic eye movements. The
experiment setting is such that we do not have enough information about the eye
area to determine the saccadic movements: they would translate into just 1—2 pixels
change of the eye pupil position in the image plane. Figure 4.3(a) illustrates this
point: in the ﬁrst 100 frames of the movie (about 6 seconds) the user’s head was still.
Due to the nature of human eye movements, the user’s eyes must have moved in that
time interval. However, the resulting changes in the eye coordinates are only one or
two pixels. In the following 200 frames (about 12 seconds), the subject moved the

eyes. These moves resulted in about :l:5 pixels change in the eye pupil coordinate.

To determine saccadic eye movement we must use a high resolution camera that
would zoom in on one eye, and the lighting conditions must be carefully controlled.
Alternatively, an infra-red light source could be used to track the precise eye move-

90

ments, as discussed in Section 2.4. In both cases, the user’s head must be stationary
(or move just slightly), since the tracking systems are sensitive to the head motion.
In the later text of this document, we will use the term “gaze direction” in the

following context:

A Gaze direction:

The point in the display plane determined through a transformation of the 2D

image plane coordinates of the eyes and tip of the nose.

The transformation we use does not necessarily result in the point that coincides
with the point that the user foveates. However, in the case when the user is in front
of the display and explores it, the gaze point determined by our algorithm is a good

approximation of the point which the user foveates.

5.2 Gaze Direction Determination Using ANN

One intuitive solution to the gaze determination problem is to determine the left-
right motion by measuring the distance between the eyes and nose in the horizontal
direction. This is easy to achieve and provides accurate results. However, applying
a similar logic to up-down motion does not produce good results, since the eyes-nose
relations are similar for a face looking up and down.

Another option is to ﬁt the features’ image position data for various gaze directions
into a curve or to classify them in some way. However, using pure image coordinates
would require a vast amount of training data that would cover all gaze directions
from all image positions.

91

 

 

 

 

 

 

 

 

 

\ face boundaries /

Figure 5.1: Relations between the eyes and nose used as the input to the neural
network to determine the gaze direction. Dotted lines show the relations we use as
the ANN input, dashed lines show the relations that had high weight in the resulting
ANN.

Finally, after an analysis of the eyes and nose position within the face frame, it was
concluded that some relations matter more than others, and that the gaze direction
could be determined from the relative positions of the eyes and nose within a face.
The solution to the function approximation was to use an Artiﬁcial Neural Network.
The input to the network are the eyes and nose coordinates and their relations, and
the output is normalized screen coordinates of where the subject is looking. Figure 5.1
depicts the eyes-nose relations used as the input to the network. All inputs to the
ANN are normalized to values from 0.0 to 1.0. The relations used (depicted in dashed

lines) are:

e X and Y coordinates of eyes and nose (total of 6 inputs),

0 distance between two eyes, and each eye and nose in X and Y coordinates (total

of 6 inputs),

0 distance of eyes and nose to X and Y face boundary (total of 12 inputs).

92

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

- |<[ *
Isl

 

 

WI

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

r------r
r------r

Figure 5.2: Sample input data for ANN trained to determine gaze directionDashed
lines Show the target grid points, rectangles represent face boundaries and triangles
represent the corresponding eyes and nose locations.

The ANN has a total of 24 inputs, one hidden layer, and 2 outputs (normalized
X and Y screen coordinates). Different numbers of hidden units were tested, and the
best results were achieved for 6 hidden units.

Training samples were obtained by having users look at buttons in a 4 x 4 grid on
the workstation monitor while their pictures were captured. For each captured image
the coordinates of the face boundaries and found features were recorded along with
the screen coordinate of where the subject was looking. Figure 5.2 shows the sample
input triangles and face boundaries for the 4 x 4 grid. Then, the neural network was
trained using the QuickProp program [27].

The results using only the neural network-based mapping were not satisfactory.
The main reason was that no movement history was used, and thus, for small changes

93

in the input, the mapped screen coordinate changes could be quite signiﬁcant. Fig-
ure 5.3(a) shows the screen coordinates in time for a sample movie ﬁle. The subject
was instructed to look from the upper left to the lower right corner in spiral moves.
For each frame, the ANN was run to output the screen coordinates. As can be seen,
the movements of the imaginary cursor are not smooth at all. The reason for this
problem is that the ANN’s output could change signiﬁcantly for small changes in the

input data.

5.3 Gaze Direction Determination Using Move-

ment Vector Scaling

To improve the performance of the gaze tracking, a combined approach that takes
advantage of smooth subject motion is used. After an initial estimate of the screen
coordinates using the neural network, we scale the subject’s movements in the image
into the display cursor movements. We can achieve smooth movements of the cursor
as long as the subject’s head is moving smoothly. The logic is similar to that used
for an ordinary mouse, where the display/ control gain factor is used. In the long run,
this approach enables us to adapt to individual users needs, where each user would
decide how much he or she wants to move the head in order to move the cursor.
Also, an adaptable gain factor could be used so that for faster motion a greater gain
value would be used, while for slower motion a smaller gain value would be used.

94

Screen coord. in time, movie file mv6, motion vect. based mapping

 

 

 

 

 

 

 

 

 

 

1 I : I ' I , I
Raw coord. +—
’c‘
E
2
.8.
2
a
.E
‘9
>
O 1 l 1 1 L
0 0.2 0.4 0.6 0.8 1
X coordinate (row)
(a) Gaze tracking using neural network based mapping only.
Screen coord. in time. movie ﬁle mv6, motion vect. based mapping
1 I I I I
Raw coord. +—
0.8” "::::::;::c:‘ :‘;. . "
’E
5 0.6 -
.8.
O
E
.E
E i; z ; f . -.
§ 0.4" , 3 ’::2;'::::t:"" .7
>_ . ‘ ,'- 3 '. ‘
0.2 r ‘3 i " . -
o l I Lu: 3 C I: . ? 3 3 ' 5 1 7 i ' -
O 0.2 0.4 0.6 0.8 1

x coordinate (row)

(b) Gaze tracking using scaling of movements in picture to display movements.

Figure 5.3: Comparison of gaze tracking results using the neural network only and
using the scaling of movements in the picture to display movements.

95

This could enable fast coarse positioning and slower and smaller motion for ﬁner
positioning. These issues will be further discussed in Chapter 6.

In the experiments described below, the display/control gain factor was deter-
mined based on the screen resolution and the head movement range in the X and Y
coordinates. For the screen resolution of 946 x 910 pixels, and the maximum expected
head movement of 50 pixels in the X and 30 pixels in the Y direction, the gain factor
was 18.88 in the X and 30.33 in the Y direction. The gain factor was constant during

the experiment.

To calculate the gaze point on the screen, the following rules are used:

0 if the current frame is the ﬁrst frame with the face detected, the ANN is used

to estimate gaze point;

e if the gaze was estimated in the previous frame, the new gaze point is calculated
as the previous gaze point plus the velocity vector calculated in the Kalman ﬁlter

equations;

0 if the gaze point coordinates are out of range (e.g., in the case when the dis-
play/control gain is not well adjusted to the subject’s movements), the out—of—

range coordinate is set to the maximum or minimum coordinate;

o if the tracked features are temporarily lost, the gaze point is kept at its previous

location.

96

The above rules ensure that the gaze point is always in the display units range.
A problem might arise if the gain factor is not appropriate, and that problem could
be solved by individual calibration or on—line adjustment.

Results of the gaze tracking for a 4 x 4 grid are depicted in Figure 5.3. The
subject was instructed to look in spiral moves from the upper left corner to the lower
left corner and to traverse all grid points. As can be seen in Figure 4.3(b), the X and
Y coordinates of the tracked points are smooth in time. The graphs in Figure 5.3(a,b)
show the X and Y display coordinates normalized to values between 0.0 and 1.0. The
raw coordinates obtained by the mapping were normalized to the grid coordinates.
Only the grid coordinates were shown to the subject. Using the neural network-
based mapping gives very unstable screen coordinates, as shown in Figure 5.3(a). On
the other hand, if the movement scaling is used, smooth movements of the screen

coordinates are achieved, as shown in Figure 5.3(b).

5.4 Smoothing of Gaze Point

The approach described in the previous Section will work well when the tracking is
highly accurate, and when the user moves smoothly. However, in many practical
cases these conditions will not be met. As a result, the detected gaze point will not
be very stable and will look more like the output in Figure 5.3 (a).

To overcome this problem, we use two levels of smoothing of the gaze point pro-
duced by movement scaling. First, we estimate gaze point using the Kalman Filter.
The input data to the ﬁlter are gaze coordinates produced by the movement scaling.

97

Then, a weighted averaging of estimated gaze point is done by using 90% of the pre-
vious value and 10% of the new value. In this way, we ensure that all the changes in
the gaze point will be smooth.

Comparisons of the three different ways to determine gaze point are shown in
Figure 5.4. Figure (a) compares the movement scaling and the use of the estimate
from the Kalman ﬁlter. Clearly, the ﬁltered output is much smoother. Figure (b)
compares the estimate from the Kalman ﬁlter and weighted averaging. In this case the
differences are a bit more subtle, and they are visible only when the user is still, and
staring at one point. In that case, small vibrations of the head will be compensated by
the weighted averaging. The results and applications of this method will be discussed

in more detail in Chapter 7.

5.5 Results

The algorithms described in the previous Sections can be applied in many ways.
In this Section, it is described how the user’s gaze can be monitored and used for

applications such as visual perception studies, and how the user can control the

cursor by moving his/ her head.

5.5.1 Fixations Determination

As discussed in Section 1.2.2, determining the precise gaze point is of great interest
in visual perception studies. However, measurement equipment is intrusive and users
are not observing the displays in a natural setting. In our work, we applied our

98

Gaze point determined by movement scaling

 

 

1200 I I I I I I
Original —
1'3" -
/ -‘-—sv
2 4' ' c, ' ’ _
.g 4- -,’
'2
(a) move- § 7’ -
ment scaling §
3
> _
l

 

 

 

0 200 400 600 800 1000 1200 1400
X screen coordinate
Gaze point determined by KF estimation

 

 

 

 

 

 

 

 

 

1200 I I I I I ‘F
' KF estimate —
1000 P l ‘
I] I g. I
2 _ ..
g 800 - -1 I 1 s,
. g .
(b) KF esti- 600 — f _
. C
mation 33 ‘
8
> 400 - ‘ ‘ d
200 h A «‘5’ ‘
5"“ '27.?!)
/
o r l I l l
0 200 400 600 800 1 000 1 200 1 400
X screen coordinate
Gaze point determined by weighted averaging
1200 l I I I I I
Weighted averaging ——
1000 -
9 800 s
.2
'2
(c) weighted § 500 _
. C
averaging E
3.. 400 —
200 _
0
O 200 400 600 800 1000 1 200 1 400

X screen coordinate

Figure 5.4: Comparison of gaze point determined by movement scaling, Kalman ﬁlter
estimation, and weighted averaging

99

MONITOR

 

11.4in (27 degrees) 1

 

 

 

Figure 5.5: Fixations measurement experiment setting and viewing angles

gaze direction measurement method to measuring the ﬁxation points while the users
observed an image. The setting of the experiment consisted of a workstation monitor
(19” diagonal) and users comfortably sat in a chair at about an arm’s length from the
monitor (about 2 ft away) (see Figure 5.5). The image was displayed at 1200 x 800
pixels resolution and covered the whole screen. The span in the horizontal direction
was about 40 degrees of viewing angle and in the vertical direction it was about 26
degrees. These angles are much more than 15 degrees, which is the limit that can be
viewed by just eye movement, so that the users had to move their head a bit in order
to view the whole displayed image. Thus, by measuring the gaze as proposed above,

we could reconstruct the user’s viewing patterns.

The output of the gaze direction detection algorithm is a sequence of the screen co-
ordinates with a timestamp. The measurements are not output in constant intervals,
and the time interval depends on the image analysis time in each frame, and ranges
from 20 msec to 800 msec (when a log ﬁle is written). In addition, since the user is

100

not able to keep the head perfectly still, the gaze point. is not. perfectly still when the
user’s head appears to be still, but might. oscillate slightly. To be able to calculate
the ﬁxation points, we apply the algorithm in Figure 5.6, which has another level of
smoothing in it. Simply, results for each frame are accumulated for at least 100771360.
The averaged gaze location is computed and Euclidean distance is calculated from
the gaze location from the previous time interval. The computed distance determines
whether the subject’s head is moving or whether the head is still. Fixation is deﬁned

as the period of time while the subject does not move signiﬁcantly.

In addition, the minimum and maximum value of the screen coordinates are found
and all the values are scaled to be in the range 0.0 to 1.0. This is needed since no
calibration is performed and the user might move the head for just a fraction of the
full range, so it is important to re-scale the gaze point. The results of the gaze path
and ﬁxations for a sample image are shown in Figure 5.7. The subject. was shown
the image for 30 seconds and was told to examine the image and remember as many
details as possible. A total of 23 ﬁxations were recorded, with durations ranging from

123 to 1690 milliseconds.

These results are not equivalent to the results of the Purkinje eye-tracker, where
on the same image the number of saccades was about 40 and subjects viewed the
image for 15 seconds. In our setting, we are not recording the actual saccades, but
rather the head movements during the exploration of an image. In one ﬁxation, there
iS probably a number of saccadic eye movement that might be recorded with a camera
that would zoom in on the subject’s eye.

101

 

ﬁnput: on the frame level gaze coordinata

Gin(x,y) normalized to 0.1 range and
processing time Tin

Constants: MIN_FIX_TIME lOOmsec,

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

L FIX_THRESH 0.01 j

11
Add Gin,Tin to Gavg,Tavg

‘ no

< Tavg >= MIN_FIX_TIME >#
ll yes

EDcur = EuclideanDist(Gavg,Gprev)

11

 

—y—els< EDcur <= FIX_THRESH >110—

li i
< EDprev <= FIX_THRESH >LJ< EDprev <= FIX_THRESH >

(7 yes 9 yes
Add Gavg,Tavg to Gprev,Tprev Output Gprev,Tprev

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4i

 

1
Set Gprev,Tprev to Gavg.Tavg

 

 

 

 

 

 

 

'1
Set EDprev to EDcur

 

 

 

 

 

Set Gavg,Tavg to O

 

Figure 5.6: Fixation determination algorithm

102

 

Photo from “50 Favorite Rooms by Frank Lloyd Wright” by Diane Madden:

Figure 5.7: Gaze path and ﬁxations for a sample image. White line indicates the gaze
path. Gray dots are ﬁxation locations, the dot size indicates ﬁxation duration. Black
line connects consecutive ﬁxation locations.

The complete results of the ﬁxations determination experiment are presented in
Appendix A. The experiment was conducted adjunct to the evaluation of the head-
eye input device described in Chapter 7. A total of 18 subjects viewed three different
images for 30 seconds each, and the gaze path and ﬁxation locations were determined

automatically using the algorithm in Figure 5.6.

5.5.2 Attention Measurement

Another aspect where automatic gaze determination can be used is measuring user’s
attention (for example to television, c0mputer,...). Currently used methods rely on

103

Camera for automatic analysis Camera for manual coding

\ I

 

      
 

 

 

 

 

COMPUTER1
TV 19" monitor I
19" monitor J ’1’
“ A [I
r
TABLE KEYB DARD I

 

 

 

 

 

USER

Figure 5.8: Attention measurement experiment setting and viewing angles

the manual coding of video tapes that record the subject (e.g., while watching TV).
These methods are tedious for the coders, and even with a highly automated system,
the human coder gets tired quickly and is prone to erroneous coding. The gaze
determination system we developed could be used in such an application and would
be completely automatic. The only human intervention is to set-up the recording at

the beginning of the experiment and the measurement can be done for an indeﬁnite

time.

We conducted an experiment to verify the usability of our system in such an
automatic measurement. This was a pilot project that would give us an insight in
how well the automatic measurement would work. The experiment was conducted
with Jay Newell, PhD student in the Telecommunication Department at Michigan
State University [56]. The manual coding has been done in the MIND lab at the

104

Telecommunication Department at MSU by Jerry Roll, an intern in the lab, and Jay
Newell.

The subjects were asked to watch a video tape about 13 minutes long of a newscast
and advertisements. Throughout the video a WWW address would be shown, either
in the newscast part or as part of the advertisment. The subjects were instructed
to enter the WWW address into the Netscape program whenever they would notice
one on the TV. The Netscape program was already running on the computer. The
ﬁrst four minutes of the video tape contained mostly the newscast and in the nine
remaining minutes contained both the advertisements and the newscast. The purpose
of the task was to have the users look back and forth from the computer to the TV
monitor. A total of 10 subjects participated in the experiment, two of them were
the experimenters themselves. A black background was used to avoid any erroneous
detection of faces in a pink carpeting that was in the room. Two subjects, who had
glasses, had three blue squares (above eyebrows and on the tip of the nose), and the
blue squares were tracked. That was necessary due to the current problems with
tracking when a person wears glasses.

We assumed the set-up of the experiment as depicted in Figure 5.8. Subjects
were seated about 24” away from a computer monitor, and about 36” away from a
TV monitor. The viewing angle between the TV and the computer was 60°. The
algorithm used to analyze the data is similar to the one used for ﬁxation determination
(Figure 5.6) and is given in Figure 5.9. What is basically done is to check whether the
subject moved left or right with respect to the last extrema point. The extrema point
is calculated dynamically and in the horizontal direction only and depends on the

105

 

nput: on the frame level gaze coordinate Gin(x,y) normalized to O...1 range and processing time Ti
Constants: MIN_FIX_TIME. FIX_THRESH, CHANGE_THR

 

 

 

A A
V V

l
[ Add Gin,Tin to Gavg,Tavg j

l
< Tavg >= MIN_FIX_TIME >104

yes
[ EDcur = EuclideanDist(Gavg,Gprev) :l

l
JS< EDcur<= FIX_THRESH >—n°——

it it

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

< EDprev <= FIX_THRESH > "O c n0< EDprev <= FIX_THRESH >
l yes i yes
Fixation still lasts Moving starts, fixation just ended
Add Gavg,Tavg to Gprev,Tprev Output Gprev,Tprev

 

 

 

ﬁxation = Gprev
if view_loc_prev == VIEW_TV :
extrem_X = MIN(ﬁxation_X, extrem_X)

else

 

 

extrem_X = MAX(ﬁxation_X, extrem_X

 

A
V

it

Update viewing status
if (Gavg_X-extrem_X) >= CHANGE_THR : current_view = VIEW_COMP
else if (Gavg_X-extrem_X) >= -CHANGE_THR : current_view = VIEW_TV
else : current_view = prev_view

. i . yes
< current_vrew == prev_vnew >

no

 

 

 

 

 

 

Output prev_view, current_time-start_time

btart_time, prev_view, extrem_X = current_time, current_view. fixation_X

 

 

Set Gprev,Tprev to Gavg,Tavg

L -4

v‘

17
Set EDprev to EDcur

 

 

 

 

 

 

 

 

 

Set Gavg.Tavg to O

 

 

Figure 5.9: Attention measurement calculation algorithm

106

subj_ 1 000. Gaze coordinate
Y 1' V

 

 

 

 

WV Y I Y
1 L- X left eye coord --—— _
.93
. 0.6 ..t
0.6 r- —
a:
a. 0-4 e . ~
x
0.2 — —1
o l L 1 l A l l
O 1 00 200 300 400 500 600 700 800
Time (seconds)

X left eye coordinate

eubj__1000. Viewing status for automatic and manual coding: low-TV. high-computer
1 Y ' ' I '

Comp 1

_.__ _ ._. , ',- ....... -. e-. , H .—. H .. --. q‘

. l 3 -; ﬂ -- ll: 1 1- : -: : ;: g . 15.1.“ tIl a f— ;.; ;,~ Hi; .- g ,1 ;-~
: i : :- g, 2- z: : 1? : I: . ;:- : ::::;: h: i : :lv .;: :::‘:: - 5.:1- -i:
- . I ' I a I ’ I II :

' I ' ' 1 1. l I. . : u . H . . up ,, :1 3 p Ii .. .. ., u =
i i , ll . . ‘ D I .. _ . I... .. .5. , '
: z | -. . . i . , ,, , . :i.v ., 'i' k I
. . V 'n n I . I , I, , | '9': ,. '0' g .
. . .: : z . I ' :z - . ~35! ': ::. ' '
_J . L «_~ - . 1.: :__.._.‘ ' ~ ' ' t- .. 2

Mill ll llllllllllliﬂllllllllllll l|l_._|JllilHiiﬂlllUllll

VIE’MM status

 

TV
1 A
300

A l
"I 00 200 300 400 50
Time (eeconde)

800: Automatic (top) and manual (bottom) viewing locations: low-TV, high-computer

subj__1 001 . Gaze coordinate
' I

 

 

 

 

 

 

 

 

 

1 T I I I
X gaze coord -—
22 “ ”' -*
g 0.3 - —
§ 0, _ a
$~
3: 0.4 -— —
.92
a c 0-2 -- d
o 1 % A L 1 A A l
O 1 00 200 300 400 500 600 700 800 900
Time (eeoonde)
X gaze coordinate
subj_1oo1 , Viewing status tor automatic and manual coding: low-TV. high-computer
Y Y I r r I I 1
manual —
———-—-— —— -- —————-. _... - -——~ .——.—. ——. ___. -- .__. -_, U‘ 4. U0 "" _.
Comp —;i I E: l 1:! ; I g r: : :g; H l i g :5 15:: g E If i : ll ; f T. “a
1‘3 == ‘ 5' ' ”5H i ii =a:a 5; i : : ~55; ' ii : s :2 z 5. I l!
a as s as; 52:: ! :2 552 n; , ”.5: .2 s: 5 i :3: 3 i"
g’ C T}; .55----5 55 =__-5:55_: E_-_5=.5 5:E___::__5 _____-_5 23555 :-.= :5 5”? 55 :__ _::___.= .- —
om r ”ﬂﬂ l'—_—‘| ﬂﬂ 1r -
> TN! _H W W
~ ’ CI-
1 l 1 I l L l l
O 1 00 200 300 400 500 600 700 800 900
Time (seconds)

801: Automatic (top) and manual (bottom) viewing locations: low—TV, high-computer

eub]_1004. Gaze coordinate
I r V V I
X left eye coord —

 

 

 

 

a: " "

g 0.8 b -‘
g 0.. w _ - w _
.5 W * -
x 0.2 F- —4

o A A 4 A l A L
O 1 00 200 300 400 500 600 700 600
Time (seconds)

X left eye coordinate

subj__1oo4, Viewing status for automatic and manual coding: low-TV. high-—computer
I I V Y I I

 

manual —
Comp .— ——“______---__':, _________ r____- . = ‘ na_i ----- _.
I: =

.

I

I I l

.

_____ .

Comp

 

 

 

W

 

 

memngsetus
J
l.‘.'.'.‘.'--
0 .-
o
t - .
I
l
l l

I
O 1 00 200 300 4 500 600 700 800
Time (seconds)

804: Automatic (tOp) and manual (bottom) viewing locations: low-TV, high—computer

Figure 5.10: Gaze coordinate and viewing positions (TV vs. computer) over time for
three subjects 800, $01, 304

107

current viewing location. For TV attention, it is the minimal point in that viewing
interval, and for the computer attention, it. is the maximal point in that viewing
interval.

Figure 5.10 depicts the X gaze coordinate on the screen, left eye X image coor-
dinate, and the viewing position determined automatically (upper line) and through
manual coding (lower line) for the duration of the experiment. We can see that there
is some agreement in the two codings and that in some occasions the two codings do
not agree. The ﬁrst two plots are for a subject who had three blue cosmetic dots that
were tracked. That subject wears glasses and the tracking of the facial features could
not be done very accurately with glasses on. Thus, blue dots were tracked. The later
four plots are for the subjects whose facial features have been tracked.

The constant value in the algorithm is the threshold used to determine whether the
subject moved enough to signal that the viewing location has changed. The threshold
was determined empirically and for some subjects it worked well, while for some it
didn’t work well. In addition, we observed that for a number of subjects where there
was almost no agreement in the viewing location, the input signal was the source of
the problem. We then tested two input signals: gaze point on the screen, and raw
image left eye coordinate. There would be a difference between the two signals, since
the eye coordinate is used to determine gaze coordinate, and in the transformation
process some information might be lost.

To determine the magnitude of the change of the signal (x coordinate over time
only), the histograms of the absolute value of the changes magnitude were plotted
for both input signals. All the data points were scaled to the 0.0—1.0 interval and

108

siij 000. X coordinate changes magnitude histogram aubL 1004, X coordinate changes magnitude histogram

 

Leh eye X coord. —V

 

 

 

50 v v 1 v V v V V 1
Left eye X coord. — 1

‘0 l 40

30 {i 30 r

 

'56 ol data
'7. of data

 

 

 

   

 

 

 

10 15 20 25 30 35 40 45 50
Histogram bin (bin size 0.02)

0 5 10 15 20 25 30 35 40 45 50

Histogram bin (bin size 0 02)

(a) Narrow histograms (b) Wide histograms

Figure 5.11: Gaze and eye coordinate changes magnitudes histograms for two subjects

bin size was 0.02, with a total of 50 bins. The histograms should have Gaussian
shape with a peek at zero. If the changes magnitudes vary, the Gaussian curve will
be wider. Figure 5.11 depicts two sets of histograms for the gaze and eye coordinats.
The common patterns in the histograms were that both histograms were narrow
(Figure 5.11(a)), one histogram was narrow and the other was wide, or both were
wide (Figure 5.11(b)). The narrow histogram generally gives a better estimate of the
viewing change threshold. Thus, we select input signal type based on the histograms:
the one that has 70% or more of the data points in the ﬁrst. ﬁve bins was selected
Over the one that had less than 70% in the ﬁrst ﬁve bins. If both histograms were
narrow, the one that had 95% of the data in a smaller numbered bin was selected. If

both histograms were wide, the one that was narrower was selected.

Once the input type is selected, the viewing change threshold is selected based
on the percentage of data points in the ﬁrst ﬁve bins (N5). The bin up to which
N5 + (100 — N5)/2 percent of the data lies is selected as a threshold.

109

 

 

 

Automatic selection Manual best-threshold sel.
Kappa with Kappa with
% tolerance % tolerance
Sub j. Input thr. agree Os i1s Input thr. agree 03 :l: 1.9
1000 EYE 0.14 57.58 0.18 0.31 EYE 0.08 68.26 0.37 0.58
1001 SCR 0.28 57.80 0.15 0.26 SCR 0.30 60.28 0.20 0.30
1002 EYE 0.14 59.18 0.21 0.51 EYE 0.14 59.18 0.21 0.51
1003 EYE 0.14 62.08 0.26 0.44 EYE 0.32 76.96 0.52 0.55
1004 EYE 0.14 66.18 0.30 0.58 EYE 0.28 95.57 0.91 0.94
1005 SCR 0.24 66.36 0.11 0.14 SCR 0.28 69.67 0.12 0.16
1006 SCR 0.20 63.19 0.26 0.31 SCR 0.34 63.32 0.27 0.30
1007 EYE 0.14 64.80 0.30 0.37 EYE 0.22 74.56 0.49 0.52
1008 SCR 0.32 68.06 0.28 0.40 SCR 0.12 70.57 0.32 0.55
1009 EYE 0.14 47.31 -0.08 0.15 EYE 0.08 53.55 0.03 0.33
Avg. 61.25 0.20 0.35 69.19 0.34 0.47

 

 

 

 

 

 

Table 5.1: Percentage of agreement in viewing positions and Kappa statistics for all
subjects

We assumed that the manual coding is correct. We compared the percent-
age of agreement of the two codings, in every millisecond, and the corresponding
Kappa [76, 17] statistic both with no tolerance and with a i1 second tolerance. The
Kappa statistic is commonly used to compare different coding schemes. Table 5.1 (left
ﬁve columns) shows the input type and the threshold selected for each subject, the
percentage of agreement and both Kappa statistics. For most subjects, the automatic
coding was correct in at least 60% of the measurement interval. Kappa statistics with
no tolerance were on average 0.2, and with the one second tolerance they were on
average 0.35. For one subject, 1009, the automatic coding did not produce a satis-
factory result. The right columns in Table 5.1 give the best manually selected results
for each subject. The thresholds were selected for each subject independently by the
author. For all but one subject, both the percentages of agreement and the Kappa

scores are higher than in the case of the automatic selection. In the case of subject

110

 

 

Human coder Fully automatic Manual threshold sel.

TV Comp. TV Comp. TV Comp.
# time # time # time # time # time # time I
Subj. ob. me. std ob. me. std ob. me. std ob. me. std ob. me. std ob. me. std
1000 42 8 2 43 10 2 22 14 5 22 9 2 40 9 2 40 7 1
1001 20 22 2 21 17 4 25 17 5 25 14 4 25 17 5 25 14 4
1002 42 10 1 42 8 2 76 4 1 76 6 1 76 4 1 76 6 1
1003 45 7 3 46 10 2 45 7 2 45 4 1 6 39 25 6 20 14
1004 15 22 3 16 26 5 62 4 1 63 7 2 6 52 40 7 16 10
1005 6 117 - 7 6 2 9 53 30 9 26 13 8 58 33 9 27 13
1006 43 9 1 44 8 2 11 23 9 11 15 7 8 3123 8 12 5
1007 41 9 2 42 8 2 16 18 7 17 12 5 5 50 39 5 5 2
1008 28 17 2 29 9 3 22 21 8 22 8 5 34 15 5 35 5 l
1009 45 10 2 46 8 2 44 9 2 44 4 1 62 7 1 62 3 1

 

 

 

 

 

 

 

 

 

Table 5.2: Number of observations, their mean and standard deviation in seconds for
manual coding, and automatic coding

1004, the agreement with the human coder was almost perfect (the last two plots in
Figure 5.10).

Table 5.2 shows the number of observations of changes in the viewing locations
and mean durations for the human coder, fully automatic selection and automatic
selection with manual threshold setting. Both the number of observations and the
mean duration of the attention interval in the case of automatic coding differs signif-
icantly in many cases compared to the manual coding results. This means that the
automatic coding either generates too many or too few viewing location changes and

that it misses some quick and short glances to either TV or computer monitor.

The above results show that the method could be adapted and improved. Firstly,
the tracking of facial features must be very accurate. In our case, for some subjects
the tracking failed to detect the correct features and detected erroneous features. As
a Consequence, the automatic detection would detect more changes in the viewing

lOcation. In the case of subjects 1000 and 1005, the tracking worked reasonably well,

111

since blue dots were tracked instead of the facial features. For subject 1000, the
achieved percentage of agreement and Kappa scores were high, and the number of
observations and their mean duration were comparable. Subject 1005 was mainly
looking at the TV monitor and made just a few glances to the computer. Thus, in his
case, it was not easy to determine the appropriate thresholds through the histograms.
The automatic coding only error was that it detected that the subject looked at the
computer toward the end of the recording and never switched towards the TV, which

resulted in an error.

Secondly, the position of the camera should be carefully selected. One solution
would be to position the camera so that the subjects have to turn their head com-
pletely to view the TV, and in that case the face would not be visible at all. An easy
attention determination would then be to simply check whether the face is visible or
not. The problem in that set-up is that, in some cases, peOple make just a glance to
a. TV and the face is still visible. In addition, such a situation does not resemble the
real-world set-up, where the people often have the computer and TV arranged as in

our experiment, and glances quick and small in magnitude are frequent.

5. 5 .3 Cursor Movement

In the previous section, a passive observance system was described. The user has no
feedback in terms of where the system thinks the user is looking. If the user is given
that feedback and sees the actual gaze point on the screen, he or she can respond and
intEract with the system. The user can willingly move the head to produce cursor

112

motion. As the system gives the feedback on the cursor position the user can learn
to move in the right way so that the cursor can be moved along the desired path.

No calibration is needed at the beginning of the usage, rather, the system cal-
ibrates itself while it is being used. The Kalman ﬁlter estimation will pick up the
motion trend as the user moves. If the tracking is temporarily lost, or if the user tilts
too much, the re—calibration procedure is rather simple: the user should just look in
the left-right, top-down, or vice versa, direction, and the ﬁlter parameters will get
re—adjusted and the user will be able to control the cursor again.

In practical cases, novice users have some difﬁculty using the system in the ﬁrst
few minutes until they ﬁgure out what needs to be done. When they familiarize
themselves with the system, they are able to move the cursor rather well. They
are also able learn when to perform the re—calibration procedure and resume cursor
control. That means that calibration is not the key issue in the system. The key
issue is the concept of the system: the user learns how to move the head in order to
control the cursor.

Figure 5.12 displays sample results of the cursor control. Figure 5.12 (a) shows a
spiral path along an 8 X 8 grid: the subject was instructed to look from the upper left to
the upper right corner in a spiral path and to attempt to traverse all the grid points.
The solid line shows the cursor path and the dashed line shows selected buttons
(equivalent to the solid line normalized to the grid points). The path was rather
straight, and there were a few out-of-path moves that were corrected. Figure 5.12 (b)
shows a cursor path during a net-surﬁng session. The snapshot of the screen shows the
Netscape window and the black line shows the cursor path. The subject was selecting

113

 

Screen coord. In time. movie file mv6, motion vect. based mapping
“"‘“ =I

 

0.875

 

 

 

 

0.625

 

 

 

 

0.375

Y coordinale (column)
0
(II

0.25

 

 

0.125
4

 

 

 

 

 

 

    

 

 

 

 

 

— 1 1
0.625 0.75 0.375 1

 

Xooordiiigturow)
(a) Interactive data and 8 X 8 grid, subject looked 1n spiral moves. Solid line shows
the cursor path, dashed line shows the selected grid buttons.

     
    
  

. ..., .._ «..., l

.:. _ ’ ' K. i
N» i
[In

 

Mm thr.:l'u'. r'ni I. ”it-"I II- 91-! I' i M l
Harmer-n. a; I.A- “Ola.
- i—I-v | I-- l- I -II II-

  
 
 

     

s :ith.)

 

(a) Net surﬁng using the head— —eye control. Black line shOws the cursor path. [Image
is presented' In color.]

Figure 5.12: Sample results of cursor control using gaze tracking

114

“Back”, “Forward”, “Stop” buttons and search engines just right of the “Snapshot”

icon. Results of an evaluation of the above method will be presented in Chapter 7.

5.6 Summary

In this Chapter, a method for gaze direction detection was presented. The method
is based on the use of an Artiﬁcial Neural Network to determine the initial gaze di-
rection and the display vs. image coordinates gain factor. In this way, movements of
the tracked feature coordinates are scaled into display cursor movements. A method
for smoothing of the gaze path based on the Kalman ﬁlter estimation and weighted
averaging was presented. The described algorithms were applied to a ﬁxations mea-
surement and results were presented. Results are presented of a pilot study that
compared the automatic attention measurement system based on gaze tracking with
manual coding. It was described how the gaze direction could be used to control
the cursor motion and preliminary results for an 8 x 8 grid and free-form button
arrangement were presented. It was also shown that the cursor can be moved along
a predeﬁned path using the head-eye input interface, e.g., spiral moves along all grid

points, or to a speciﬁc point on the screen.

115

 

Chapter 6

HCI Based on the Head-Eye Input

When we interact with a computer, we need to watch the display. In most state-
Of-the-art computers pointing on the screen with a mouse is a must for most of the
tasks. While we are moving the mouse with our hand, we need to observe the results
On the screen with our eyes. We have to learn how to move the mouse and then we
need to learn to coordinate the mouse movements with our eye movements. This
hand-eye coordination, however, is a barrier for some people, and some peeple just
Cannot learn how to move the mouse well. If we were able to capture the gaze and
enable the user to effortlessly point with head and eyes, we will be able to overcome
the hand-eye coordination problem and will provide a more natural way to control a
Computer.

Another important issue is the level of intrusiveness of the gaze tracking equip-
ment. Many systems are available that are very intrusive and do not allow natural
motion, as discussed in Chapter 2. Other systems are not intrusive but require spe-
Cialized and/ or expensive hardware or require some make up that will be tracked.

116

The system described in Chapter 5 offers a relatively inexpensive and easy-to—use
alternative to other eye-tracking systems. This Chapter describes how a human-
computer interface for the head-eye input interface has been developed. Issues such
as selection mechanisms, appropriate button sizes and feedback level are discussed
and suggestions are made on what is applicable. The discussion is based on the
observations made while various users used the proposed head-eye input interface for

typical computer interaction tasks.

6.1 Selection Mechanism

With hardware pointing devices, selection is typically made by depressing some but-
ton or key that is attached to the device. In a handless interface, such a selection
mechanism would not be appropriate, since it would involve the use of hands. Many
eye—tracking systems opted to use a “dwell button” selection mechanism: a button
would get selected if the user ﬁxates it for some predeﬁned time interval. If the du-
ration of the interval is small, problems like “Midas Touch” would arise, as discussed
in Section 2.6.7. If the duration of the interval is long, the user would easily get tired
while using the system.

One natural solution is to use the face as a selection mechanism. Either facial
expression or certain head movement could be used as a signal to select a button. Us-
ing predeﬁned head movements is similar to the “on-screen selection button” concept
discussed in [86] and 2.6.5. In [86], Ware and Mikaelian showed that an on—screen
selection button had the worst performance. To make a selection, the user would

117

need to ﬁxate the desired target button for a predeﬁned period and then to ﬁxate
the selection button for another period of time, and ﬁnally, the desired target button
xvould be selected. In the eye-tracking system they tested, the head was still during

the experiment.

6.1.1 Selection by Head Motion

If we are to apply such a logic to our proposed head-eye input, the user would need to
double the amount of head motion for each selection, which might be uncomfortable.
Some other head motion (e.g., move head up-down for yes, or left-right for no) can
be an alternative selection mechanism. This kind of selection would be rather long,
and very confusing, and could lead to the Midas Touch problem. What if the user
wants to just look somewhere up or down, and not to make a selection? How would
the system distinguish that from the real selection? One option is to have a large
number of false alarms, which would lead to the user’s dissatisfaction with the system.
Another option is to have a custom-designed application that would incorporate such

moves in the user interface.

We designed an experiment where the users would ﬁxate one screen button, then
they would get some feedback from the system on what the system thinks they ﬁxated.
Then, they would reply with a yes or no signal, which was nodding or shaking of the
head, The user had no feedback on the program-calculated cursor position. Figure 6.1
illustrates the state diagram for yes/ no signals: NO, SO, WE, EA stand for moving

North, South, West, East, respectively. The motion is checked at every processed

118

l..EGEND

b .
NO - moving North 0

SO - tuning South
EA - mowing East
W'E - nnving West

@ mu

  

Figure 6.1: Selection by head motion—state diagram for “yes” and “no” motion
detection

frame. As an underlying GUI for the experiment, we used a card game: the user
would ﬁxate one of 15 cards (arranged in 3 rows and 5 columns), and when the
program detects that the user ﬁxated a card the program would change the card that
it thinks the user ﬁxated. If the change was correct, the user would reply with “yes”
motion, and if the change was incorrect, the user would reply with “no” motion. If
the user doesn’t notice the change in a card, the program would start flickering the

Changed card after a timeout period.

The order in which the user viewed the cards is depicted in Figure 6.2. The paths
‘30 the selected location (dashed line), and yes or no response paths (solid line) are
depicted in Figure 6.3. In some cases the user had to move a lot until the system
recOgnized the motion, and in some cases the response was not detected at all. Out
Of 15 selections, user’s response has been correctly recognized 10 times, “no” response
Was not recognized in 3 cases, and in 2 cases the user didn’t notice any change and
didn’t attempt any response.

119

v screen coordnate

V screen coordhaie

 

 

 

 

 

 

Figure 6.2:

Card 4, YES mum delecled

 

pam to target ~—
user yes/no response

 

 

 

 

0.66 ,3» l 1
r" [ 1
A 1 ,
0.33 . . ' .
I l I
. [ \
o a/ l
0,4 0.6 0.8
X screen coordnale
Card 1, NO monon _noL domed
1 STAR
T'
0th lo target ———~,
user yes/no response if 1.
TARGEY I 1 l 1
l
0.66 5 . l A;
«fr/"FEEECTION 1

 

 

X screen coordnate

Y SCYOO" coordinate

Predeﬁned order of viewing the cards

Cara H. No motion detected

 

TARGET 11 ‘

pain to target . ~
user yes/no dispense

 

 

0.8

. 0.6
X 50188" CWdlnalﬂ

 

 

    

Card 3. No cam change detected

 

a oath to target —-
a, user yes/no response
a. a, [
i ’7 ‘ TAR
a SELECTION 3‘ GET 3
g 0.66 I
Q I
U
c l
8 1
$1 I
> 0.33
l .
1 i
‘ l
0 I
0 0 2 0 4 0 6 0 B

X screen cooriﬁnaie

Figure 6.3: Paths to selected locations and YES / NO response paths

120

 

 

Preliminary results of this experiment showed that it would be possible to design
an interface using motion response. However, this type of selection would not be

suitable in a real-world application due to long selection times.

6.1.2 Selection by Facial Expression

As discussed in the Section 6.1.1, head motion would not be suitable for selection.
The remaining option is making a facial expression as a selection mechanism. This
should be easy to do for the user; after ﬁxating a desired button, the user would make
the required facial expression and thus make a selection. The expression to be used

must be easy to do and should not involve unnatural movements.

In this work, an “open mouth” expression is used as the selection signal. The
reason we use a mouth expression instead of, say, eye blinking, is that mouth move-
ments are done voluntarily. Eye blinking is not always done consciously: humans
blink frequently to moisten their eyes. Thus, that is not the best way to control the

computer.

To determine whether the mouth is opened or not, we look for a big, dark ellip-
soidal blob just below the nose. The best thresholded image from Figure 3.11 is used
to ﬁnd the mouth. Figure 6.4 shows examples of blobs for several face expressions:
neutral, smiling (with mouth closed), open smiling (with mouth opened), and open
mouth. For these four basic expressions, it can be clearly seen that the open mouth
€Xpression is distinct from the other three because of the dark blob.

121

Neutral Smiling Open SmileOpen Mouth

Original images

   

4 \“l .J ‘9' I; I" . -
" i i‘ ' i "‘ '
Thresholded images

(Pictures are part of the author’s private database.)

Figure 6.4: Sample mouth expression

Neutral

       
   

Smiling Open SmileOpen Mouth

t

Original images

Thresholded images

(Pictures are part of the author’s private database.)

Figure 6.5: Mouth expressions for a dark-skin person

122

still. mouth opened

still. dwell > limit. mouth opened . ,
moving. mouth opened
‘ ACI'ION_PRESS

mouth closed

 
   

  
 
   

mouth closed

Figure 6.6: Mouth states transition diagram

The selection is done only when the subject is still for a predeﬁned time. “Being
still” is deﬁned by the difference between the previous and current position of the
gaze point being not more than 1% of the total screen width and height. Figure 6.6
depicts the transition diagram for mouth states. When a still state is discovered,
it is checked for the state of the mouth. If the mouth is open, the transition to
MOUTHJCTIONIRESS is made. If the subject starts moving with the open mouth, we
deﬁne that state as MOUTHJCTIONDRAG. Finally, when the mouth is closed, we return
to the initial state MOUTH.ACTION-NONE.

An alternative way to check for the open mouth state would be to learn the
intensities of the mouth blob for each user. For example, in the case of dark-skin
persons, the mouth blob was lighter than the skin area when they open the mouth in
our experiment. Thus, in that particular case, we need to look for an above-threshold

area. Figure 6.5 illustrates four facial expressions in the case of a dark-skin person.

6.1.3 Other Ways to Make a Selection

Sections 6.1.1 and 6.1.2 discussed two ways of making a selection that would involve
the interaction using the head only. However, for some users that might not be

123

possible or practical. Alternative selection mechanisms are hardware button, voice
commands, or tongue-keyboard [5]. The users would then point the cursor with their
head, then depress some hardware button (e.g., a key on the keyboard), or say a

3

command (e.g., “yes’, “no”, “go”,...).

6.2 GUI Design Issues

Figure 6.7 shows the GUI of the button selection program. In the upper left corner
is the subject’s image as recorded by the camera. The tracked feature points are
highlighted with the red crosses. Right below it we show a thresholded image of the
mouth region that is used to recognize the mouth state. On the right is a grid where
the program indicates gaze direction and selected button. A dark point indicates
screen coordinates where the head-eye input interface is pointing, and when a selection
is made, its color is changed to blue.

If the users are to use this kind of interface, what is the right way to display
the information and what is the level of feedback that should be given to the user?
When someone is controlling an application (e.g., Netscape), they do not need to
know whether the program ﬁnds their face or not. However, in the learning phase or

during the troubleshooting, they need to have the feedback.

6.2.1 Level of Feedback

In the setting of Figure 6.7, the user has full feedback on whether the tracked features
are found and what the mouth state is as well as where the gaze point is. If there is

124

 

[Image is presented in color.]

Figure 6.7: Snapshot of the face tracking demo GUI. Red dot shows where on the
screen the user is looking

125

some error in tracking, e.g., the user tilts left or right and the tracked features are not
found any more, the user can notice that on the display and initiate the re-calibration
procedure (Section 5.5.3) or move in the more appropriate position. Also, the user
can learn how to open the mouth so that the open mouth state is detected.

In our experiments we designed a GUI that had no feedback, except that the
selected grid point was highlighted. The GUI is similar to the one displayed in
Figure 6.7, except that the camera images and other items on the left are not shown
and the grid points are expanded on the whole screen.

The users were able to use both GUIs without any notable difference. While
the task (e.g., selecting buttons as guided by the program) was performed, the users
didn’t pay attention to the feedback window at all. The only times they would need
the feedback window was when the tracked features would be lost (e.g., the user would
either tilt too much or would move in the chair so that the face would not be fully
visible). Then the feedback was essential for fast recovery.

The presence of full feedback is also useful for novice users. While they are not
used to the interface, having the full feedback helps them learn how to perform best

(e.g., how to open the mouth, and how much to move).

6.2.2 Button Sizes

What is the appropriate button size that can be selected by the head-eye input
interface‘? In terms of the updating accuracy of the cursor coordinates, we can update
up to one pixel value. The users just need to learn to move very slowly to achieve

126

that kind of update. We experimented with several different button arrangements: in
an 8 x 8 grid, two 100 x 100 pixel icons, and a Netscape browser and mail windows.
In all experiments, the setting was the same as shown in Figure 5.5. The display
diameter was 19”, or 15.2” x 11.4” with the resolution of 1266 x 1010 pixels.

In the case of the 8 x 8 grid, the grid button size was 2.075” x 1.425” inches

(in horizontal vs. vertical direction), or 5° x 3.250 of viewing angle. The size of a
100 x 100 pixels icon was 1.2” x 1.16”, or 2.750 x 2.750 of viewing angle. In the
latter case, we could arrange 12 x 10 = 120 icons on the display, which gives a lot
of room for various application needs. These large-sized selection areas would be
suitable for custom-designed applications, e.g., a text editor or text display where a
few buttons are needed and the remaining area would just display the text. However,
most application programs have a large number of variable-sized buttons. Netscape
is a typical example of such an application.

Figure 6.8 shows a typical Netscape browser and mail window. The buttons and
text entry ﬁelds that have been used in the experiments are highlighted. A typical
news WWW page is also included and the sizes of its menu items and story titles are
highlighted. Figure 6.9 gives the Netscape button sizes relative to the display size.
Table 13.1 in Appendix E gives the sizes of the buttons in pixels, inches and degrees of
Viewing angle. As can be seen, the button sizes vary greatly. The navigation buttons
in the Netscape window itself could be conﬁgured to be larger if needed. However,
the buttons on a speciﬁc WWW site cannot be re-conﬁgured. Most of the state-of-
the-art WWW sites often have a menu of options and each menu button is rather

Small in size. For a sample WWW site we used, the buttons are only 2.50 x 0.5” of

127

 

 

   
 
  

o LmlwdmavuZyt-n

o «yum I
I” I“ in
o Latch-rd hm uh ma

     
   
 

o Hovhz‘t alluded HaWd

a WrZIPaI-Io'tloeumenl-Mmr

MIIR'
.. w M... v“ "' ‘t‘. “"2"" In: to l‘x...
..- ;- an..." [1 , - o

Zchod-cntry
t..:.........».....u....mms-.a....

wu-—.-——n~.«:
Dan-1‘ lwlhﬂlM-‘lwnml .u— unncumm—a M
m-mnmiminuowvnunnpn. can-nu _lMy

 

 

 

 

 

 

 

 

 

 

 

H- 56. You Took W _ .. _ _ ‘ WEI?
"G-'“.Je¢ao en
a.“ Mares-lieu wager ...WW
"II: Elf : 1
l ' .t-‘ ,
W'M
“ ..W . '9 WW ’ «
' l ‘ ', 'l’eg'ldd
AM. \
Tutu-tym-

 

 

Bﬂoﬂ but

“all bmn

 

 

[Image is presented in color.]

 

Figure 6.8: Netscape browser and mail windows with selected buttons highlighted

128

close buttons panel
0-3XI-3 deg back/send buttons l.4xl.3 deg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

15.2"l35 0
<- ----- - - - ------------------------ >
A [ f I ‘
E [5 [£1 [5 F URL entry l6.3x0.7 deg
I l l 0‘ URL arrow 0.4x0.7 deg
: < Netscape browser 27.5x24.3 deg
F : E [:3 [I Scroll bar 0.4x2.7 deg
9‘ I r———] ‘l l)
a: I A A
I - <— Display 35x27 deg
l L 4
GO button 0.8x0.8 deg
ZIP code entry 2.2x0.8 deg
Story title 2.4x0.2 deg

 

menu buttons 2.5x0.5 deg

Figure 6.9: Netscape button sizes relative to the display size

129

viewing angle arranged in an array with no spacing in between. Thus, if the users are
to successfully use the head-eye input interface, the accuracy of the system must be
up to a few degrees of viewing angle.

If we are to use this interface on a different display size, the pixels and inches
values would differ based on the display size. In case of a smaller display (e.g., 14”
diameter or less monitors) the user would, generally, sit closer to the display. Thus,
for small values of the viewing angles, the pixel count would decrease signiﬁcantly
comparing to the medium size display (e.g., 19”). As the result, the selection areas
would need to be larger, in order to achieve a reasonable accuracy. On the other
hand, in case of a large size display (e.g., Immersadesk of 4 x 5 ft or 77” diameter),
the small viewing angles would result in a larger pixel count than in the case of the
medium size display. This would result in the possibility to achieve a higher selectable

resolution on a large size display.

6.3 Advantages and Disadvantages of the Head-

Eye Input Interface

Each new input interface must have some advantages over the existing ones to be
widely accepted and used. In the case of the mouse, it offered an easy pointing
mechanism, and was widely accepted. Devices like the trackball are widely used in
the notebook-style computers because of their size. Touch-screens are a widely used
and convenient device for many public information systems.

130

[[ Head-eye [ Eye-tracking I Mouse [ Trackball [ Touch-screen

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Needs. ”and'eye C00” No No Yes Yes No
dination

Needs and occupies No No Yes Yes Yes
hands

Tolerates head move- Yes No Yes Yes Yes
ments

Display resolution Medium Medium Fine Fine Coarse
For small size displays ‘

(ATM machines) Yes Yes Yes Yes Yes
For medium size dis-

plays (Workstatations) Yes Yes Yes Yes Yes
For large Size displays Yes No No No No
(Immersadesk)

Selection times [I 3 sec 1 sec [ 1 sec [ 1 sec [ 1 sec

Table 6.1: Features and requirements for the head-eye input, eye-tracking based input,
mouse, trackball and touch-screen

On the other hand, eye-based input has been used mainly by handicapped users,

this interface is not widely used.

often as their only way to control the computer. Due to constraints on head movement
and no easy way to make selections, this interface has not been widely accepted. Head-
based input has been used in some military applications and by handicapped users.

Usually, a device is mounted on the head and tracked. Similar to the eye-based input,

The head- and eye-based input interfaces however offer a more natural interaction

With the computer. Table 6.1 gives some features and requirements for the head-eye
input interface, eye-tracking based input, mouse, trackball and touch-screen. In all
hand-based interfaces, the users must check with their eyes the results of the input

deVice movements. Thus, hand—eye coordination must be learned. For some users

(GB-g, elderly and children) this is not easy to achieve. All the hand-based interfaces,

131

naturally, occupy the users hands. In tasks such as blind typing, if the user wishes
to select another window, he/she must ﬁrst move one hand off the keyboard, then
position it on the mouse, trackball or screen, make the selection, and ﬁnally put
it back on the keyboard and resume typing. Even for experienced users, this task
can cause a number of errors and slow-downs. Errors such as missing the mouse
position or wrong re—positioning of the hand on the keyboard are common. In terms
of head motion, no hand-based devices constrain it. A head-eye input interface does
not require the head to be still, the user must move the head in order to control
the cursor. In the case of eye-based input, in most systems the user’s head must be
stationary or some small head motion is allowed.

In terms of the display resolution and size, head-eye and eye-tracking input can
be used for ﬁne resolution selections, but require some skill from the user, and are
prone to errors. This is due to the nature of our head and eye movements: it is very
hard for the user to keep the head perfectly still and it is impossible to totally control
eye movements. A touch-screen cannot be used for ﬁne resolution selections. When
used on small size displays (e.g., ATM machines), a head-eye input interface would
be suitable since the number of options on such screens is not high and the buttons

are typically large. In terms of medium size displays (e.g., workstation monitors), all
interfaces are suitable. However, in the case of a large display such as the Immer-
Sadesk, “conventional” input devices like the mouse, trackball, or touch-screen are not
Suitable. Due to the screen size the user would need to move the head signiﬁcantly in
Order to locate the cursor, or would need to move the whole body and hands to reach
the buttons on the display. Eye-based input, similarly, would not be suitable in such

132

an environment: the user would need to stand far from the display in order to be able
to have the whole display in the 150 viewing range. In such an application, head-eye
based input is a natural solution: the users need to move their head naturally to view
various points on the display.

Finally, in terms of selection times, when measured on the conventional worksta-
tion monitor, the selection times for eye-input, mouse, trackball and touch-screen are
about 1 second per selection. In the case of the head-eye input interface, the selection
time is about 3 seconds per selection. This is signiﬁcantly higher than for other input
devices. Even the expert user (e.g., the author of the interface) was not able to make
selections faster.

We must note, however, that the 3 seconds time includes the time needed for the
user to position to the target location, dwell time before the mouth state is checked,
and the time needed to open the mouth by the user. If the users would be making
selections by, for example, depressing a hardware button, the selection time would

decrease. In some preliminary experiments, the selection times were about 2 seconds.

6.4 Summary

In this Chapter, it was described how a gaze tracking system was integrated into a
human-computer interface. Several selection mechanisms were discussed: response
by nodding and shaking the head, and selection based on opening the mouth. GUI
design issues such as level of feedback and button sizes were discussed and preliminary

results were presented.

133

Chapter 7

Evaluation of the Head-Eye Input

Interface

Eye-tracking systems have been used in many ways. Mainly, their use has been limited
to handicapped users and as a research tool in visual perception studies. Recently,
the development of handless interfaces is gaining momentum and new head and eye
tracking systems are being developed for the wide range of users.

In Spring 1998 the author conducted market research for an independent study
course. The research was done with Kang Hun Lee, an MBA student. In the
course of the study, two focus group interviews were conducted and the partici-
pants suggested the following applications: “mouse/trackball/touchscreen replace-
ment, aid/train/monitor handicapped persons, control head moves on an avatar in
VB applications, as an advertisement aid, for choosing TV channels, as a teleconfer-
encing aid, in focus of attention measurement, in a virtual museum, for monitoring
high-risk working environments, as an identiﬁcation system aid, for opening the door

134

when one’s hands are full, for typing using gaze direction, for screen savers, to count
the number of people in a crowd, in hospitals to aid/monitor patients, for an au-
tomatic door, for security in banks to aid clerks during a robbery, as an aid for
children to use computers, in car theft prevention, in opthalmology to ﬁt glasses, to
zoom the screen, and for selecting from a menu in a fast-food restaurant”. As our
participants suggested, applications are numerous, ranging from support systems for
the disabled, through replacement for a pointing device, to the aid in cognition and
market research.

In order to assess the usability, performance and possibilities of the head-eye input
interface, we conducted a comprehensive evaluation of the interface. In this ﬁnal
Chapter the results are presented. In the evaluation we attempted to incorporate
most of the tasks needed for typical applications. With these results we hope to shed
more light on the requirements of a head-eye input interface, as well as on the pace

of learning to use a novel interface.

7 .1 The Goals of the Study

In order to verify the assumption that subjects improve performance with the head-
eye input interface and to assess the range of tasks that can be performed we designed

an experiment with the following objectives:

0 Introduce novice user to the interface and let him / her learn basics of the usage.
0 Perform a set of common computer tasks and assess the performance levels. The

tasks should be arranged in increasing level of complexity. The tasks that are

135

performed ﬁrst should serve more as the learning tasks. The ﬁnal task should

incorporate the learned material from the previous tasks.

0 Repeat the sessions after some time interval and compare the performance.
The number of repeats is arbitrary and depends on the circumstances of the

experiments. The more the users repeat, the more skilled they should be.

In the repeated session, the subjects would perform exactly the same tasks and
under the same conditions. That way, the only variable from session to session

is the subjects and we can assess a subject’s learning pace.

In our case we had to constrain the session time to one hour and we were able to
have only one repeated session. These constraints were due to subjects’ availability.
Thus, the range of tasks had to be narrowed to be able to ﬁt everything in the

prescribed time interval.

7 .2 Subject Pool

Eighteen subjects gave informed consent for participating and were paid $10 for each
session. The subjects of various complexions participated and all of them repeated
the experiment. Table 7.1 details variations in complexions of the faces. As can be
seen, many variations in the human population were presented in the subject pool.
Most of the subjects were students or professors at Michigan State University, and
two of the subjects were 10-year old children. A number of subjects normally wore
eyeglasses and they had to take them off during the ﬁrst session. During the second

136

Race: Caucasian - 9, Asian - 6, Indian / African - 3
Hair color/style: black - 10, brown - 4, blonde 2, bald - 2
Gender: male - 14, female - 4

Glasses: no - 12, yes - 6

Table 7.1: Subject pool: details of the variations in facial complexities

session, 4 out of 6 subjects with glasses and 2 children had blue rectangular spots
that were tracked instead of the facial features.

The set-up of the experiment was the same for all the subjects: we used a black
background to avoid any erroneous detection of faces in the pink carpeting we had in
the room. For one dark-skin subject we used an additional light source that enabled
successful tracking.

Along with the results of our subjects, the results for the author herself are pre-
sented. The author did not participate in the study since she can be considered an
experienced and expert user. The author’s results (as subject 20) are presented just

for reference and are not included in any of the calculated statistics.

7 .3 Evaluation Procedure

In each session, subjects performed ﬁve tasks. In the later text, we will use names

task T1, T 2, T3, T4 and T5. The tasks were the following:

T1: Subjects had to move the cursor along a curve for 60 seconds. The video image
with highlighted tracked feature point locations was displayed in the upper left

137

corner of the display. The curve spanned the whole display area. A total of

three curves were used:

1? = cos(t) ocos(t),

y = 0.5 + 0.2 - sin(4t)
(shape of ﬁgure 8)

:1: = 0.5 + 0.2 - sin(4t)
y = cos(t) ecos(t)
(shape of ﬁgure 00)
:I:=0.5+0.5-t,

y = 0.5 + 0.5 - cos(4 - ~7r)
(wave form spanning from the left to the ﬁrth edge of the display)

0 curve 0 {

t: [—7r...7r]

0 curve 1 { ’ :2 [—7r...7r]

0 curve 2 { t: [0...1]

The curves were presented to the subjects in the following order: curve 0, curve
1, curve 2. The measure of the performance was squared error of each cursor
location with respect to the closest curve point averaged over the number of

cursor update points.

T2: Guided button selection on an 8 x 8 grid. In each trial, the target positions were
randomly selected and there were 10 targets. The random number generator
had a unique seed value for each trial, thus, all the subjects repeated the same
sequence of the target buttons. A target button would be highlighted in the
blue color by the program and the subject would have to align the red dot that
represented the selected grid button with the blue target grid button. When
the points were aligned, the subject would open the mouth to make the ﬁnal
selection. If a button was not selected after three attempts, the next target
button would be introduced. The GUI consisted of the grid buttons arranged
to cover the whole display area and the only feedback was on the tracking

138

success was the selected grid point. Three trials were done. After the head-eye
input control and selection, the subjects performed the same tasks using the
conventional mouse so that the results obtained with the two input interfaces

could be compared.

The measure of the performance was the number of buttons correctly selected,
and in which attempt, as well as the time needed for each selection. The time
was measured with respect to the distance between buttons, by Fitts’ law (see
Section 2.6.4). The conventional mouse data was plotted as one linear regression
line, the data for the ﬁrst trial as the second line, and the data for second and
third trials were plotted as the third line. The reason for this latter breakup is
to compare the performance in the initial (training) trial, and in the two later
(testing) trials. Since we assume that all the subjects are experienced mouse
users, we analyze all the three mouse trials together. In addition, the cursor

path for each button selection was plotted.

T3: Guided dragging of 100 x 100 pixel icons. The initial and target positions were
randomly selected, and there were 10 draggings in one trial. The random num-
bers generator had a unique seed value for each trial, thus, all the subjects
repeated the same sequence of the target location pairs. The initial position
was labeled with a basketball image, and the target position was labeled with
a basket image. The subject had to correctly place each basketball icon, and
only then the new icon position would be displayed. The selection was done by
opening the mouth, and the mouth was kept open while the icon was dragged

139

to the destination. When the mouth would be closed, the icon was released.
There were three trials as in the task T2. Upon completing the task with
head-eye input, the subject would repeat it using the conventional mouse. The
measure of the performance was the same as for the task T2, with the addition
of measuring the number of steps needed to move the icon from the initial to
the target position, where a step is the number of select-move-release tasks a

subject performed.

T4: Automatic determination of the gaze path and ﬁxation points. The users exam-
ined three pictures for 30 seconds each. The results of this task were presented

in Section 5.5.1 and are given in full in Appendix A.

T5: Surﬁng the internet and writing an email message using head-eye input for mov-
ing the cursor and selection, and keyboard for typing. All the subjects per-
formed the same task that included typical browsing and writing tasks. The
Netscape program was used. The layout of the windows with selected buttons
is depicted in Figure 6.8. Initially, the Netscape window was centered on the
display and the mail program was minimized and its icon was located just left
of the upper left corner of the Netscape window. The subjects did the following

tasks:

1. Click on the URL entry ﬁeld, using backspace/ delete keys delete the cur-
rent contents, type in “www.msnbc.com”, and hit return key. The news
page would load.

140

. Click on the headlines menu button (the last button among the menu
Options). The intermediate page with an advertisement would load and the
subject would need to click on the sentence “Click here to go to headlines”,
or simply wait for a few seconds, and the headlines page would load. If the
subject would click on the advertisement image, s/ he would need to click

to the “Back” button to return to the headlines page.

. Scroll to the very bottom of the page by clicking on the area just above the
scroll arrow. The number of clicks was about 5, depending on the amount

of information in the headlines page.

. Click on the ﬁrst weather story (located at the bottom of the headlines
page). The page with the menu options on the left and the weather story

on the right would load.

. Click on the weather menu button. There was the advertisement page also,

as for the headlines button.

. Click on the ZIP code entry ﬁeld, enter local ZIP code using the keyboard,
and click on the “Go” button located just right of the ZIP code entry area.

Weather report would load.

. Click on the mail icon. The icon would open up in the upper left corner
of the display.

. Click on the “New msg” button.

. Type in an email message to the author about the weather report. The
typing tasks included clicking on the “To:” ﬁeld to enter the author’s

141

email address, clicking on the “Subject:” ﬁeld to enter the subject of the
message, and clicking on the text entry area and typing the body of the

message.

10. Click on the “Send” button.

The experiment procedure was the following:

o The experiments were conducted in the PRIP lab in the Computer Science and
Engineering Department at MSU. The experiment set-up consisted of: an SGI
Octane workstation with a camera mounted on the top of the monitor. The
subject would sit in a chair without. any constraints as if s/he were working
at the computer in the usual circumstances. The procedure was completely

non-invasive. The experimenter was the author of the program.

0 During the experiments the video data was not recorded, but processed by
the program and discarded. The only data saved were the coordinates of the
tracked points on the face, screen pointer coordinates, and statistics relevant
to the experiment (e.g., time to select a button, whether correct button was
selected, etc). The session of task T5 (Net-surﬁng) was taped (the display and

the subject’s side proﬁle) so that the task could be later evaluated.

0 The training session consisted of a general demo of the system by the experi-
menter, then the subject tried the system (s/he would see the picture from a
video camera, with eyes, eyebrows and nose marked by the program, and s/ he
would be able to move the cursor on a grid and practice the selection mecha-

142

nism). The training session lasted 5—10 minutes. No data was collected during

the training session.

0 The testing sessions would test tasks T1, T2, T3, T4, and T5, in that order.
Before the data collection would begin, the experimenter would brieﬂy describe
the steps in the current. task. Then, the task would be performed and the data
would be collected. The projected times needed for each task were: T1: 5
minutes, T2: 5-10 minutes, T3: 5 minutes, T4: 2 minutes, T5: 15 minutes.
Upon conducting all the tasks, it turned out that T2 was completed in about
5 minutes, and T3 in about 10 minutes. All other projected times were correct

in practice.

0 Finally, the subject would ﬁll-out a short follow-up questionnaire.

The second session was repeated about one week after the ﬁrst session and all the
tasks were the same. The only differences were that T4 was not conducted for the

second time and there was no questionnaire.

7 .4 Results

In this Section, the results of the study are presented. Each task is presented in a

separate section for clarity.

143

7.4.1 Task T1: Moving the Cursor Along a Path on the Dis-

play

In the Task T1, the subjects had to move the cursor along a curve for 60 seconds.
The video image with highlighted tracked feature point locations was displayed in the
upper left corner of the display. The curve spanned the whole display area. A total
of three curves were used: (0 - 00, c1 - 8, and c2 - wave form. The performance for
each subject was measured as the average squared error with respect to the target
curve. All the coordinates were normalized to the O...1 range. The data for the ﬁrst
and second session for each subject were compared. Squared error for the individual
curves ranged from 0.03 to 0.15 in the ﬁrst session and the sum of squared errors for
all three curves ranged from 0.14 to 0.31. In the second session, the error for the
individual curves ranged from 0.02 to 0.13, and the sum of squared errors for all three
curves ranged from 0.11 to 0.28. Figure 7.1 gives the average squared error in the
two sessions as well as the errors achieved by the author. The overall performance
increased, both the minimal and maximal value of errors decreased in the second
session. In terms of percentage of improvement, the average error for each curve
improved, ranging from 8% to 15%, and for the overall error, the improvement was

12%.

As for the improvement for the individual subjects, for curve 0 61% improved
their performance, for curve 1 55% improved, and for curve 2 72% improved. In
terms of the overall improvement (sum of the squared errors for all three curves),
72% improved the performance.

144

Ava square error for task 1. sessnn 1 Avg square error for task 1. session 2

 

 

a

 

Minimum
Average ~77
015 ~-——--——-—"“‘* Maximum _... r 015 . Maleum -—-—— .

7 l Author's —°—
, i 013 ' K <
o 12 ' < 0.12 <

[ Minimum —+—
Average +

Average square error
Average square error
0
8

 

 

 

 

curve 1 Curve 2 Curve 3 Curve 1 curve 2 Curve 3

Figure 7.1: Task T1: average squared error

To illustrate the shape of the cursor paths, we plotted the target curve and the
cursor path that the user produced. Figure 7.2 gives the cursor path for the best
and worst overall subject performances. All the three curves for the same subject
are given. The subject who achieved the best performance was able to move the
cursor along the target path rather well and the errors were minor and the general
shape of the movement was in the shape of the target curve. In the case of the
worst performing subject, it is very hard to determine the movement pattern, and
clearly, the subject struggled to move the cursor even close to the target curve. In the
case of the second curve, the cursor was completely off the curve most of the time.
Complete comparative results for each subject are presented in Appendix B and the
improvements shown in the squared error comparisons are clearly visible with the
bare eye.

I45

 

Screen coord. in time. curve 00 lollowmo. sub1_3012. avosqarror 0 025450

 

 

 

 

  
     

 

 

 

Screen coord In time. curve 00 1ollowtng. subljoos. ava.sq.orror 0 10m

 

 

 

 

 

 

    

 

 

 

‘5“ WM? Task curve - - -
1 User data 7 1 User data W
Stan pom! - * - polnl
End pornt - . .‘ r
0.3
S 3
u (I
é %
E 3
C C
S E
it s
> >
0
0 0.2 0.4 0.6 0.! 1 0 0.2 0.4 0.6 0.8 1
X screen coordinate X screen coordinate
Screen coord. In time: curve 01 louowing. subL3012. avg.sq.error 0 02627! Screen coord. in urns: curve 01 tollowing. subLZOOG. avgsqenor 0.151451
Task curve Task curve ~ ~ ,
1 User data 77 1 User data
pornt Ian polnl
a paint - ..r
0.5 0.8 ‘
’ i
g 0.6 g o s
U
'5 0.4 a 0.4
> >-
02 0.2 .'/
l
o I I re .17 ' I o f \ _ 7,.7 ‘
0 02 0.4 0.6 0.8 1 O 0.2 0.4 0.6 0.8 1
x screen coordinate X screen coordinate

Task curve , 7 Task curve 7
1 User data 1., ,, 1 User data 7
Stan point
End point

 

Screen coord. In time: curve 02 loltowtng. subL3012. avgsqerror 0.056590

 

 

 

 

0 0.2 0.4 0.6 0.6 1
X screen coordinate

subject 12, session 2
best overall performance

Screen coord. In time. curve l2 lollovma. subLzoos, avgsq error 0.055681

 

Y 36789" coordnata

 

 

 

 

O 0.2 0.4 0.6 0.3 1
X screen coordnata

subject 05, session 1
worst overall performance

Figure 7.2: Task T1: cursor path for the best and worst squared error results in the
ﬁrst session

146

Percentage ot correct and missed selections tor task 2 Average selection times tor correct and missed selections tor task 2

 

 

 

 

100 r -——— .
: SGSSDOFI 1 Sessm 1 ——~—
\ Season 2 , ., Sessnon 2 '-
Author‘s data _._ ‘2 L Author's data —-— ,

 

80 > \ t

 

 

 

5

Percentage 0t selections
8 8
///
// /
// ,//
Average selection time (sec)
er en

 

 

l
:w
)

 

 

 

L A \‘\'L ~f/fh A A \\..// a .
0 1st attempt 2nd attempt 3rd attempt 3rd miss 0 1st attempt 2nd attempt 3rd attempt 3rd miss
(a) Selection accuracy statistics (b) Selection times (seconds)

Figure 7.3: Task T2: button selection accuracy and selection times

7.4.2 Task T2: Guided Button Selection

In the task T2, the subjects had to select buttons on an 8 x 8 grid, as guided by the
program. A random target button would be highlighted and the subject would have
to select it. After three unsuccessful attempts a new target would be introduced.
There were three trials, with ten randomly selected targets. The performance for
each subject was measured based on the number of correctly selected buttons and
the total time Spent for button selection. Also, the performance using the head-eye
input was compared with the mouse-based selection. The results were analyzed for all
the three trials of the head-eye or mouse input, or for the ﬁrst head-eye trial (practice

trial), the second and third head-eye trials (test trials), and all three mouse trials.

Figure 7.3(a) compares the button selection accuracy in the ﬁrst and second ses-
sion. Subjects were able to make more accurate selections in the ﬁrst attempt in the
second session, on average 63% in the second vs. 41% selections in the ﬁrst session. In
addition, the number of buttons selected in the second and third attempt decreased

147

Time tn mtlisecanth

Time in milledcorta

Session 2

T2 Set times vs button distance by Fits' law. sessmn 1. Head-Eye—Mouth. Mal 0

Session 1

T2 SeI times vs button distance by Fins" law. session 0, Head-Eye-Voutn, met 0
211300

 

 

 

Heméyeuoum. trial 0 ' ' Head—Eye-Moutn. trial 0 -

  
           

     

15M

10M

Time in milliseconds

_ __.._., .-...
: -. 3+1! "'"

 

 

-0.5 0 0. 1 1. 2 2.5 3 3.5
109,2 (DIS o in). D ls distance between selection potnts. S is mitten size

0
«O 5 O 0.5 1.5 2 2.5 3 3.5 -1
iog_2 (DIS 4 1/2). D is distance man selection points. S :3 button size

Head-eye input, trial 0

T2‘ Sel times vs. button distance by F’rtts' iaw. session 0, Head-Eye—Mouth. trial 1 T2. Sei times vs button distance by Fitts' law, session 1. HeadEye—Motm. trial 1

 

 

HeadvEye-Moum. trial 1 -

HeadEyeMeuth. trial 1 -

3 3

Time in milliseconds
.

é

, . ' ' . --..'-'-
, . J .13.,“ u-_.

o
-1 415 o 0.5 1 15 2 2.5 3 a 5 -1 0.5 o 05 1 15 2 25
og_2 (DIS . 1m n- - . ' . iog_2 (015 91/2)."" ‘ '

Head-eye input, trials 1 and 2

 

 

 

 

 

 

Q is himnn dun Q in “mm Ii1u

Timetnmilaecanth

T2' Set times vs button dunnce by Fitts' law, session 0. Mouse. tnai o

 

 

 

Mouse. malo - Mouse. inan -
15W 15000
' B
§
10000 E 10000 .
C
e E - '
' E
. r—
5000 _ sooo
_._ _ .:-...- _' I. '. . _ "...'-
0 ‘ . _ .- _-' m -- DEW“ I

 

. 0,5 1 1 5 2 2 5 3 3 5
log_2 (DIS o 1/2). D 15 d’stance between selection Dorris. S is button size

T2 Sei. times vs button distance by Fitts' law. session 1. Mouse. ma] 0

 

 

 

-1 -05 0 05 1 15 2 25 3 35

iog_2 (013 .1/2)."'

Q to human s11-

Mouse input, trials 0—2

Figure 7.4: Task T2: selection times vs. button distance by Fitts’ law

148

from 22% and 14% in the ﬁrst session to 14% and 7% in the second session. The
number of incorrectly selected buttons decreased from 23% in the ﬁrst session to 16%
in the second session. This means that the users were generally more accurate in the
second session. The overall number of correctly selected buttons increased from 77%
in the ﬁrst session to 84% in the second session. These results are comparable with

those of the Ware and Mikaelian [86] eye-mouse study.

For every correct selection in the second and third attempt, the subject made one
and two erroneous selections. In the ﬁrst session, for each correct selection subjects
made 1.5 erroneous selections. In the second session, for each correct selection subjects

made 0.7 erroneous selections.

Figure 7.3(b) compares the selection times in the ﬁrst and second session. The
selection time increases with the number of attempts, which is expected, since the
user needs to make more than one selection. The selection time in case of the incorrect
selection after three attempts is close to the time for the correct selection in the ﬁrst
attempt. This means that the users were making erroneous selections unwillingly,
e.g., either they were reacting slowly to the stimulus of the new target or the open-
mouth detection algorithm produced false alarms. The time needed to make the ﬁrst
correct selection decreased in the second session, from 7 to 5 seconds. In addition, the
decreased selection times resulted in the overall decrease of 31% in the time needed
to complete the experiment: from 238 seconds in the ﬁrst session to 164 seconds in
the second session.

149

As for selection with the mouse, out of 1050 selections, only 9 (0.8%) were wrong,
and all were correctly selected in the second attempt. The selection times using the
mouse were around 1 second for all subjects.

Figure 7.4 plots selection times vs. button distance by F itts’ law. The left column
shows the data for the ﬁrst session and the right column shows the data for the second
session. The ﬁrst row gives the data for the ﬁrst head-eye input trial, the second row
for the second and third head-eye input trial, and the third row gives the data for the
mouse trials. The linear regression line is also plotted for each data set. As can be
seen, the data for the head-eye input do not ﬁt well into the Fitts’ law framework,
when analyzed for all the subject. The data for the individual subjects who performed
the task well ﬁt into the Fitts’ law framework better.

Figure 7.5 depicts the selection time vs. the distance, by Fitts’ law, and cursor
paths from button to button for a subject whose performance on the task increased
signiﬁcantly in the second session. As can be seen, in the ﬁrst session, the subject’s
data is very hard to ﬁt in the F itts’ law framework and the time needed to select
each button was very high. The cursor paths from button to button also show that
trend: there is a lot of motion just to move the cursor to the desired location and a
lot of motion around the target location. In the second session, however, we see that
the selection times for each button were below 5 seconds and in some cases close to
the mouse selection times. The cursor paths were much more direct and the subject
clearly didn’t make a lot of erroneous motion. Complete results of the Task T2 are
presented in Appendix C, where the data for each subject are compared along the
two sessions.

150

m H.228
226.80 coeconx
mud med md mhmd mNd mmrd o

 

 

. Rant. r .4 ...t? f »,

 

mnmd

m6

mwod

mud

mnmd

 

 

L w

m _mE .moonlﬁaw .cozsn 2 cozan E0: 9:8 .258 5302.05-08:

can cause 2 w .958 c2623 c0023 89.5.0.6 m. 0 AN: + 99 mice.

 

no n an N 3 F no 0 no- r
4 t t . 4 0
mt xvamﬁau .ﬂ .. .5... x a lullllliwtilﬁnw
. . tctlﬂtliurtnills. V . .. .- . m
Ilillln c ,..a..-. u u a v
. ..... a a . a m.
a
. a a
. a
. m
. 2
. S
we
. 9
. 3
m—
. 2
. : . «a £2.88: . t
t w; 22... .58 $368: . 2
. -.l Emersozs. was: . WM

 

 

 

«Saline .25 96 .2 so. .mzi >2 22:5 c0223 8:22“. .u> 3E: ._ow

eieugpiooo UOOJOS A

spuooes u! emu

H Eommmww

SaEPooocoocuax

_. whod mud mwwd md mnnd mNd muﬁo o
. . . . o

     

 

 

 

 

Eu 8:3 e w .258 5.8.8 8928 85% 2 o .a: + 99 «too.

 

QM n md N m.— — md o md.
v. ,- ...:. 51:13...» ...:. a. _x .a n a. . u _ _ .-
w u a . n u a a n
r U
r n I c I H
. a . 1
...n ...:.i .1, . a

.. ......... Nd 22.13302

'—
-

OIDhtoln‘fQNv-O

. t... $335.58 -2558:
lol. 03:35:02 . mono...—

 

.

i

a
a

 

 

88-33 .25 ea .2 in. BE .5 28.3 53.8 8.5% .2 85.. ..om

eieugpiooo ueeies A

spuooes ug emu

Figure 7.5: Task T2: selection times vs. distance by Fitts’ law, and cursor paths for

trial 2 for a subject whose performance increased in the second session

151

When we add the result that the number of errors decreased in the second session,
we can deﬁne overall subject performance in a session using the following formula:

T2_sessi0n_performance = (l + num.ber_0f_misses) X total_elapsed_time
The lower the score, the better performance, since we want to minimize both the
number of misses and the elapsed time. When compared using the above formula, we
get more balanced results, and can compare the scores of different subjects.

In terms of the improvement for the individual subjects in the second session, 83%
of the subjects decreased the elapsed time for the task, 67% of the subjects decreased
the number of misses. Overall task performance improvement was observed in 83%

of subjects.

Sources of Erroneous Selections

Why did the subjects make erroneous selections? Did they improve in time? It is
important to discuss these issues so that we can identify possible problems and correct
them. Figure 7.6 shows which buttons the subjects selected for each target button
in the ﬁrst and second session, respectively. For ease of reading, only the non-zero
entries are shown. The number in each entry represents the number of times that
button was selected while selecting the target button displayed in bold. We can

observe three things:

1. We can see that often subjects selected buttons adjacent to the target button.
That is understandable, the reason for this error is that they either didn’t
position the cursor prOperly or moved slightly while opening the mouth.

152

.Amcote $2 Samoan“ 893m 3min can .385
89: ucwmeawu 3cm “0 833m x53 £8380— :oﬁzn sowHSAB: 2: am 22828 magneto 2: m0 @358an 93 “.5833 83% 52w
use .2833 333 $8395 05 35853 swap 2: 503:2 gowns“ 2: 85852 $05 was $55 EDS some 5 deﬁne nomads. some new
:oﬁEEEv cote 2: 29852 9.838 93. .aowmmew 9508 can “mew m5 E 35 2:8 .2: .8“ Sec 2: 883%: 958 3833 025 SE.

 

        

 

 

 

m. I. N
F n d. r m. 1. u.
in. I t t. - a] N
.. a .II .m. ._u .
I I II I E III
I N

 

 

 

 

 

 

 

 

 

 

 

  

o m
.q.
I..
I r“III
E I

 

 

 

 

 

 

h c

. m. .mmem E:
3:253. .02 2:; Saba.

__ll

Figure 7.6: Task T2: Distribution of the erroneous selections for each target button

153

2. The number of erroneous selections at the coordinates of the previous target
button is high in most cases. The reason for this error is that subjects kept
their mouth Open (or kept opening and closing it) when the new target was
initiated and they simply didn’t move to the new target and stayed still ﬁxating
the previous target. In some cases the subjects would move very slowly from
one target to another while keeping their mouth open most of the time (in
spite of the experimenter suggestion to close the mouth and open it only when
they want to make a selection). In other cases the algorithm for detecting the
open mouth state was simply ﬁnding that the mouth was open when it was
closed, which would cause an erroneous selection. To overcome this problem,
we could repeat the same experiment using a different selection mechanism,

e.g., hardware button that would be less prone to errors.

3. In the second session, the number of erroneous selections at the coordinates of
the previous target button decreased. It slightly increased in the last trial of the
second session towards the end of the experiment, which can be connected with
the fatigue of the subjects or their decreased attention. The general trend in
the second session was that the erroneous selections were not widespread across

the grid, which means that the subjects got more skilled using the interface.

Were There Some “Hard Positions”?

We noticed that the number of errors decreased in the second session. We also have
to ask whether the errors were uniformly distributed across the display positions?

154

 

 

trial 0 trial 1 trial 2
Button 23456789101:l2345678910|2 2345678910|z|2
Sess.1 yyynynynn 5 yyyyyynnn 6 nyynyynyy 6 17
Sess.2 nnnnnnynn llynnnynnnn 2 nnnynyyyn 4’7

 

 

 

 

 

Table 7.2: Number of erroneous selections at the previous target location: “y” entries
mean that there were more than three selections at the previous target location, “11”
entries mean that there were less than or equal to three selections at the previous

target.

 

Session
1

 

   

Session
2

 

percentage of correct selections: bright number of erroneous selections per each
shades represent higher selection accuracy correct selection: dark shades represent
and darker shades represent lower selec- more erroneous selections per one correct
tion accuracy selection for a target button, and bright
shades represent less erroneous selections
per one correct selection

(Entries with a dash were not the target.)

 

 

 

Table 7.3: Percentage of correct selections and the number of erroneous selections per
each correct target selection: data from the 8 x 8 grid is collapsed into a 4 x 4 grid.

155

Table 7.3 shows the percentage of subjects who selected each target button correctly
and the number of erroneous selections made per one target button selection. The
data from the 8 x 8 grid is collapsed into a 4 x 4 grid for clarity. In the ﬁrst session, the
percentage of the correct selections for each target button ranged from 56% to 89%
with 73% average, while in the second session it ranged from 67% to 93% with 82%
average. We also calculated the number of erroneous selections subjects had to make
for each correct target selection. For different display locations, in the ﬁrst session the
number ranged from 0.6 to 1.4 with 1.1 average, and in the second session it ranged
from 0.5 to 1.3 with 0.8 average. The errors were distributed rather uniformly across
all the grid locations. There seems to be a slight increase in the number of erroneous
selections in the corner points in the second session. However, not all corner points

had this high error rate, and this does not hold for the ﬁrst session.

7.4.3 Task T3: Guided Dragging of Icons

In the task T3, the subjects had to drag 100 x 100 pixel icons, as guided by the
program. The initial and target positions were randomly selected and there were 10
draggings in one trial. The initial position was labeled with a basketball image and
the target position was labeled with a basket image. The subject had to correctly
place each basketball icon on the basket icon, and only then the new icon positions
would be displayed. The selection was done by opening the mouth, and the mouth
was kept open while the basketball icon was dragged to the basket destination. When
the mouth would be closed, the icon was released. There were three trials as in the

156

Average number ot correct selections by attempt tor task 3 Average number at dragging steps tor task 3

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

60 # Slessronvt 3° F v . I V . Sess:on71
l Sessng ,1, I Session2 -+*
‘ Authors data __._ Author‘s data —-—
so » \ . 25 - «
$5 a
3 4o» \ x, . 33. 20» l
5?? \ 5
o -
E 30 \ g
s . 3
c 8
§ 20 ‘ / 4 g
e: / t
“ /
10 . / ‘
/,
o - l
1 9 10
Attempt number Step number
(a) Number of correct selections by attempt (b) Number of dragging steps
Average selection times for correct selections by attempt for task 3
14 . 3232232; .
Authors data —+—
A 12 .
§
§ ..
§
tn 6 t
§
§ 4 ' 4
(
2 .
,l - - -

 

 

Attempt number

(c) Selection times (sec) by attempt

Figure 7.7: Task T3: number of correct selections by attempt and their timing, and
number of dragging steps

157

task T2. The performance for each subject was measured based on the number of
steps of dragging needed to drag an icon to the target location and the total time
spent for dragging. As in the Task T2, for reference the performance using the head-
eye input was compared with the mouse-based dragging. The results are analyzed for
all the three trials of the head-eye input or mouse, or for the ﬁrst head-eye trial, the
second and third head-eye trial and all three mouse trials.

In this case, the subjects had to complete each dragging task; the new target would
not be initiated unless the previous target was placed correctly. Thus, we measured
the number of steps needed to perform a dragging, where a step is the number of
select-move-release tasks the subject performed for each target. Ideally, this should
be one, that is, the subject should select an icon, move it to the target, and release it.
However, in practice, the subjects would place the icon incorrectly, then they would
have to select it again and move it to the target. This last step has been repeated
many times for some subjects. In our reporting of results, we also give the number of
erroneous clicks needed to make a correct selection. In some cases, subjects needed
more than 10 steps and/or made erroneous selections more than 10 times. Thus, we
analyzed the data when the number of steps/ errors was more than 10 as if it were 10
items.

Figure 7.7 (a) gives the statistics of the number of correct selections in the 2""
attempt (z' = 1...10). In both the ﬁrst and second session most of the correct selections
were made in the ﬁrst or second attempt. In the second session the number of times
the correct selection was made at or after the 10‘" attempt decreased 58% (from 24
to 10). The author made mostly correct selections in the ﬁrst attempt. Figure 7.7(b)

158

gives the selection times for each correct selection in the '2?” attempt. The selection
times for both the ﬁrst and second correct selection were about 4 seconds. In the
ﬁrst session, the time needed in higher attempt number, generally, increased, while in
the second session, the time needed in higher attempt number, generally, decreased.
In both sessions, selection time was around 12 seconds at or after the 10th attempt.
The author achieved times similar to other subjects for in the ﬁrst two attempts.
However, for higher attempts, the author needed more than 3 attempts only twice
(3% of selections).

Figure 7.7(c) gives the average number of times the dragging had to be performed
in i steps (2' = 1...10). In the second session, most subjects were able to complete
most draggings in one or two steps, unlike the ﬁrst session, where the number of steps
varied greatly. The author, for example, never needed more than four steps.

As for the dragging with the mouse, out of 2100 selections, only 16 (0.8%) were
wrong, but all of those were correctly selected in the second attempt. The selection
times using the mouse were around 1 second for all subjects.

Figure 7.8 plots selection times vs. distance by Fitts’ law. The left column shows
the data for the ﬁrst session, and the right column shows the data for the second
session. The ﬁrst row gives the data for the ﬁrst head-eye input trial, the second row
for the second and third head-eye input trial, and the third row gives the data for the
mouse trials. The linear regression line is also plotted for each data set. As can be
seen, the data for the head-eye input do not ﬁt well into the Fitts’ law framework when
analyzed for all the subjects. The data for the individual subjects who performed the
task ﬁt better into the Fitts’ law framework better.

159

Tlme in mlﬂlseconds

Tlme In mlltseconde

Time In mlliseconde

Session 1 Session 2

T3 Sal bmes vs button dlstance by Fms‘ law. session 0. HeaosEye-Moutn. trial 0 T3 Set times vs button dustance by Fltts‘ law. sesswn 1. HeadEye-Mouth. trial 0
20000 *7 ,

 

 

 

   

20000r v' _ _ﬁ v .
. Head~EyeMouttLtnaIO - Head~Eye-Mouth.tnalo --
' . ' l ' .
15000 15000 - - ' '
S - .
S
1»
10000- E 10000-
E
d)
E
,_
5000 5000
0 ' o '
-1 05 0 05 1 15 2 25 3 35 -1 ~05 0 05 1 1.5 2 25 3 35

loq_2 {DIS + 1/2). D |S dtstance between selection pomts, S [5 button stze log] (DIS 0 1/2). D ls d'stance between selection points. 5 is button slze

Head—eye input, trial 0
T3 Set. times vs. button dstance by Fltts’ law. sessvon O. Head-Eye-Mouth. tnal 1 T3 Sal. umes vs. button distance by Fltts‘ law. sesslon t. Head-EyeMouth. trlal 1

20000 v 20000 . .
Head-Eye—Moum. trial1 - ‘ Head-Eye—Mouth,-trlalt ‘-

 

 

    

3
§

Time in milliseconds

 

 

0 ML: ,
-l 0.5 0 0 5 1 1 5 2 2 5 3 3.5

 

 

 

 

 

 

 

 

 

log} (DIS v "2). D 13 dstance between selectlon poms. 5 Is button stze m2 (DIS 9 02). "‘ r ‘ S I‘ "mm <5"
Head-eye input, trials 1 and 2
T3; Sol. time: vs. button dlstance by Fltts' law. session 0. Mouse. trial 0 T3. Sel. umes vs. button d'slance by Flm' law. sesston 1. Mouse. tnal 0
20000
Mouse. that 0 - Mouse. ma] 0 -
15000 15000 -
8
§
10000 E 10000
E
”F
5000 5000
. .. .
. . I It . . _ e
I *4- t
o ' —. ' 0 A
1 15 2 25 3 35 -1 —05 0 0.5 1 1.5 2 25 3 35
Iog_2 (DIS o 112). 0 Is distance between selection mlnls. 8 IS button sue logAZ (DIS . 1/2). DIS distance between selection points. 5 :5 button size

Mouse input, trials 0~2

Figure 7 .8: Task T3: selection times vs. distance by Fitts’ law

160

Figure 7.9 depicts the selection time vs. the distance, by Fitts’ law and cursor
paths from button to button for a subject whose performance on the task increased
signiﬁcantly in the second session. As can be seen, in the ﬁrst session the subject’s
data is very hard to ﬁt in the F itts’ law framework and the time needed to select
each button was very high. Such a huge number of selections is the result of frequent
losses of the mouth tracking (or the user didn’t keep the mouth open consistently),
and that resulted in many short paths of the target icon. In the second session, the
number of such short paths decreased signiﬁcantly.

The cursor paths from target to target also show that trend: there is a lot of
motion just to move the cursor to the desired location and a lot of motion around the
target location. In the second session, however, we see that the selection times for each
target were much more consistent with the Fitts’ law curve, and range 5—8 seconds.
The cursor paths are much more clear and the subject clearly didn’t make a lot of
erroneous motion. Complete results of the Task T3 are presented in Appendix D,
where the data for each subject are compared across the two sessions.

In order to calculate the overall performance, we have to incorporate the number
of steps needed for each dragging and the total time. The number of steps can be
calculated using the formula:

totaLsteps = Zi:1...10(’i - number_0f_steps,-).

The last row of Table D1 shows the total number of steps averaged over subjects. The
minimal value was lower in the second session (31 vs. 35), and, the average value was
also smaller in the second session (67 vs. 84). The value of 67 steps is the number of
selections of target icon, and since there was total of 30 icon positions, each dragging

161

N nommmom
23.208 :38» x

o

 

 

 

15.3.

>

F mhmd mud mwwd md mhmd mwd mN—d

l..-'-' f

 

 

m 6.5 «8323 62:5 2 8:3 £9. ”£8 .258

0% 8:3 e w 0&8 8.8.3 $028 8:56 2 o .8: + may ~18.

58205.80:

 

 

mm m mN N m; — md 0 We. F
. .. ..... .. t. \w.\ti
..a new... .
..l..b\\ - 9
\Alc -
n n n c

 

«62.05.3302
.... . «#255530 .0363:
1.. no .2: “£30.26. wéwol

 

if

lr

 

 

N—onlﬁam .m:8_ _oxﬁ oo :69 .2 is .mzi 3 8259 c853 8555 .u> v.08: ..ow

 

OwhwlﬂvﬂNV-O

eieugpiooo UOOJOS A

spuooes u! emu

H :ommmem
236.08 coEomx
mhmd mhd mwwd

 

 

p

 

re r

2. kl,

m.o mhnd mNd mwpd o

    

 

 

[bib-.3. ... F

N as $8-38 .825 2 8:3 so: 2.8 5.58 582.0063:

Se 8.5.. e w .958 8:028 53.3 8:59.. e o .a> + @0318.

 

 

 

  

r
to
l‘.
o

 

 

en a ma N 3 F be o 99 F.

. . . : r - . o
N .1. . .av..- . q.,.m.._wu-..t. . u... o. . e .. w
u R: .. dun .u..r n. o I... m. n. n no. u an
T. . .. .. .........._ .. v
. Jun . a .Er u u . . . a m

n u o a u e u n n u u I i w
v u n a no a nu. a t \-
. . . .. . a

. . .. . . . m
. 9

C

. ~—

. 9

. 1

. . ms

. . m:

, . «.0 messages. . t
.... «.22....50 imbue: 9
ll. 23.53 .weaoa l mm

eieutpiooo UOOJOS A

epuooee u! etuLL

$8133 6:02 Elm 8560? do. 30. .95. >9 acosan c0053 09383 .n> no.5. ..om

Figure 7.9: Task T3: selection times vs. distance by Fitts’ law and cursor paths for

trial 2 for a subject whose performance increased in the second session

162

was performed in 2.2 steps. In the ﬁrst session, a total of 84 steps resulted in 2.8
dragging steps. Then the overall performance can be calculated using the formula:
T3_sessz'on_performan(:e = totaLsteps x total_elapsed_ttme
The lower the score, the better performance, since we want to minimize both the
number of steps and the elapsed time. When using the above formula, we get more
balanced results and can compare the scores of different subjects.
In terms of the improvement for the individual subjects in the second session,
55% of the subjects decreased the elapsed time for the task, 78% of the subjects
decreased the number of steps needed for one dragging. The overall task performance

improvement was noted in 67% of subjects.

Spatial Distribution of Errors

Figure 7.10 gives the spatial distribution of the erroneous selections for each target
locations pair for all subjects in the ﬁrst and second session. The ﬁlled gray rectangle
is the dragging destination, and an empty rectangle is the dragging source position.
The two adjacent rows plot the data for the same trial in two sessions. The general
trend is that the number of errors decreases as the session progresses, and that the
number of errors in the second session is smaller than in the ﬁrst session. In addition,
in the second session, the errors are, generaly, more clustered around the line that
connects the source and destination locations.

Figure 7.11 gives the spatial distribution of the correct selections for each target
location pair for all subjects in the ﬁrst and second session. Plotting format is the
same as in Figure 7.10. Ideally, the points should be clustered around the starting and

163

ending dragging positions. However, since the dragging was done in more than one
step many times, the selected points were spread across the display. The selections
are distributed on or close to the line connecting the two locations, which means
that the subjects were attempting to drag the icons along the shortest path to the

destination.

Figure 7.12 shows the spatial distribution of the erroneous selections for each
target locations pair. The darker the line that connects the target location the more
errors were made in that dragging. Clearly, in the second session, the number of
errors decreased. In the ﬁrst session, there was a lot of variation in the number of
errors for different target pairs. However, in the second session, the amount of the
variation decreased. This leads us to conclude that no particular location on the
display caused problems for the users. We were especially interested in the corner
points, since the experimenter’s and subjects’ impression during the experiment was

that many subjects had a hard time selecting corner positions.

7.4.4 Task T5: Application Control—Net-surﬁng and E-mail
Writing

In the Task T 5, the subjects had to integrate all the skills learned in the tasks T1-T3
to perform a real-world task: surﬁng the internet and writing an e-mail. The details of
the task were described in Section 7.3. The performance of each subject was evaluated
by watching the video tapes that recorded the session, and the experimenter recorded
the number of erroneous clicks, the number of times the experimenter had to turn

164

583% one .3 35E 0.53 953028 2: 823
95380— 2: udowoaoh 33. x35 stem $23.55 2: E :oSdEumwv can 858 2: $8980 0:: 2a. deﬁne“: 8.58 magma 05 2.
98:3 .9an one use :oEmnSmev mEmwﬁv 05 mm Swag 82¢ 05 £55 £83 no.8 E 5.3 "83.82 «owed» some 5“ comesﬂsmmv casewo—
mcosoﬂem 38:80 2: E0858 3538 BE. dommmom 9808 can “mi 05 E if: 08$ 2: 8m Saw 2: £8858 mace E8023 93 ER.

 

 

 

 

 

 

 

 

 

 

 

  

 

   

 

      

  

 

 

 

 

 

N
m
la
.mmemxﬂb
N
: H
la
.mmomnﬁiu
N
o
J.
_ a o m e m amass:
®UC®=UOm mhmda SOSSUA: EOmudimumov U—nd OUHDOw a0 HﬁH.

Figure 7.10: Task T3: Distribution of the erroneous selections for each target locations

pair

165

800.350 030 .3 0008 803 9850200 05 8055
805002 05 800058 Sow 0.00—m .509 805.8% 08 5 :0505800 0:0 08:8 05 800580 0:: 03H. 485002 08:00 mewmuv 05 mm
80:60 .0380 05 0:0 :0505800 wcmwwwuv 05 mm 8028 02E 2: $55 03.8 5000 5 8.3 :05002 ”.088 £000 88 demeaning H8382
$55028 8880 0% 58038 08:28 058 482008 0:88 0:0 88¢ 05 5 _05 088 05 SH 800 05 E8058 038 8000.80 93 0:8

 

 

 

 

 

 

 

 

 

 

 

 

 

308405
N

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_ S _ 0 _ w _ s _ 0 A m. _ e W
8:258 88a :03002 0830:3000 was 08:00 00 .88

m _ m 4 s 08.3.:

 

 

Figure 7.11: Task T3: Distribution of the correct selections for each target locations

pair

166

 

Session 1 Session 2

Figure 7.12: Spatial distribution of the erroneous selections for each target locations
pair: darker lines represent more errors for that target locations pair.

off the camera (in order to assume the control of the pointer) and intervene with the
mouse, the number of times the experimenter had to type in text for the user, the
total time the subject spent in completing the task, and which parts of the task were
skipped. The nature of the task was such that not all the erroneous clicks resulted
in some unwanted action. The subject might click on the Netscape window, but that
area would not be associated with a button, and there would be no change in the
displayed WWW page. Thus, as erroneous clicks we counted only those clicks that
resulted in some unwanted action such as loading a wrong WWW page or opening an
unwanted window (e.g., security information or similar). In such actions, the subject
would need to recover from the error by clicking on the Netscape’s “Back” button or
to click on the “Cancel” button on the popped-up window. All those actions would
add extra burden to the task and would require the subject to spend more time in
the completion of the task.

167

 

sub-task weight sub-task weight
1:url 0.20 6:zip 0.14
2zheadlines 0.04 7zmail 0.02
3:scroll 0.10 8znewmsg 0.02
4ztitle 0.02 92type 0.40
52weather 0.04 10:send 0.02

 

Table 7.4: Task T5: Assigned weights for each sub-task

 

 

 

 

 

 

 

Session Author
Average over all subjects: 1 l 2 Head-eyes I Mouse
Number of erroneous selections 8.6 5.2 0 0
Number of camera turn-offs 0.4 0.2 0 0
Experimenter types 0.3 0.2 0 0
Elapsed time (mm:ss) 9:12 8:35 2:40 1.15
Percent of the task completed 85% 96% 100% 100%

 

 

Table 7.5: Task T5: average number of errors, elapsed time and percentage of the
task completed

For each subject the percentage of completed tasks was calculated based on the
assigned weight of each sub-task. The sub-task weights are shown in Table 7.4, and
were assigned based on the number of selections and text entries involved in each

sub-task.

Some subjects’ initial task did not include all the sub-tasks due to time constraints
or subject’s age. For one child we changed the tasks a bit and adapted to his age.

The change preserved most of the sub-task actions but used different WWW pages.

Table 7.5 shows the average error and timing statistics over all subjects in the two
sessions. In the ﬁrst session the number of errors ranged from none to 19, and several
subjects were not able to complete the ﬁrst session. The experimenter had to turn
off the camera a total of 8 times and to enter text for the subjects 5 times. From

168

the first to the second session the number of erroneous clicks decreased 40% (from
total of 154 to 93), as well as the number of times the experimenter had to intervene
(4 interventions with the mouse and 3 with typing). Also, the total time needed to
complete the task decreased 7% (from 552 to 515 seconds). Only 5% of subjects (one
out of 18) were not able to complete the task, compared to 44% (8 subjects) in the
ﬁrst session.

For comparison the author completed the task in 160 seconds using the head-eye
input interface, and 75 seconds using the conventinal mouse. Based on the results of
the previous tasks using the mouse we can assume that the author’s performance with
the mouse is comparable with other subjects’ performance. Then we can conclude
that the average performance of our subjects in the second session was 6.8 times
slower than had the task been peformed with the conventional mouse.

In order to calculate the overall performance of the subject, we calculated the
total error using the following formula:

totaLerrors = #_err0neous_clicks + 2 - (#_turn0ffs + #_ea:pe7‘z'menter_typz'ng)

The elapsed time is also adjusted for the percentage of the task completed, so the

elapsedJime
percentageiompleted '

 

overall time was calculated as Finally, the overall performance was

calculated as:

T5_session_perf0rmance = totaLerrors - oueralliime
The lower the score the better and if the score in the second session was lower than
in the ﬁrst session, we assumed that the subject’s performance improved.

In terms of the improvement for the individual subjects in the second session,
the elapsed time for the task decreased for 61% of the subjects, while the number

169

of erroneous selections decreased for 67% of the sub'ects. Overall task erformance
J p

improvement was noted in 61% of subjects.

Which Sub-Tasks Were Hard?

Were there any differences among the sub-tasks in hardness? Based on the skipped
tasks statistics, zip code entry (sub—task 6) was skipped the most times (10), then
clicking on the weather menu (3), and all the other tasks were skipped twice or once.
It is not fair to say that sub-task 6 was the hardest, since the experimenter decided
to skip it often and to go to the mail writing part due to the time constraints and

when the session progressed with a lot of errors.

Table 7.6 shows the distribution of errors across the sub-tasks for both sessions.
It is clear that the tasks of clicking on a menu button (e.g., headlines or weather) was
hard in both sessions. The ﬁrst click in the task on the menu button (headlines) had
more errors than the second click in sequence (weather). The erroneous clicks were
always on the adjacent menu buttons. Another problematic sub-task was entering the
URL. The errors in this case were mainly made in selecting some buttons that were
located just above the URL entry ﬁeld. Entering the ZIP code was also hard, and the
errors were mainly in selecting buttons adjacent to the ZIP entry ﬁeld. Finally, two
other subtasks that had high error rates were clicking on the “New Msg” and “Send”
button. These buttons are of the same size as the “Back” button, but were located
in the upper left corner of the display. The errors were also clicks on the adjacent
buttons, some as small as 0.270 and 0.530 of viewing angle. These errors resulted in

170

 

Number of errors Number of errors
sub-task Session 1 Session 2 sub-task Session 1 Session 2
lzurl ‘24 12 6:zip 23 12
2zheadlines 42 22 7zmail 0 3
3:scroll 12 4 8znewmsg 12 12
4:title 8 1 9ztype 2 1
52weather 26 13 10:send 5 11

 

 

 

 

 

 

Table 7.6: Distribution of errors across the sub-tasks

the closure of the panel with main buttons or with the closure of the “To:” ﬁeld in

the mail window.

The scrolling task did not result in a high number of errors, but was hard for a
number of subjects. That task took a lot of time and it was problematic due to the
small target area size of only 0.36” of viewing angle in the horizontal direction. In
scrolling, there were a lot of erroneous clicks just around the scroll bar, however, they

did not result in some action that required the subject to recover from the error.

The typing task also caused some problems to the subjects who were not typing
blindly. They had to look at the keyboard when typing and the pointer would move
out of the mail window and then the keyboard input would not be accepted by the
mail window. This problem could be solved by changing the user interface to recog-
nize the beginning of typing and to freeze the pointer position. In this experiment,
however, that was not done. An alternative solution to this problem would be to
introduce a “Stop—the-head-eye-input” command that would be issued through some
facial expression (e.g., smiling) or by depressing a special key.

171

7.4.5 Summary of Questionnaires

After the ﬁrst session all the subjects ﬁlled out a questionnaire from which we wanted

to capture their impressions of the interface. The following questions were asked:

1. Do you ﬁnd the idea of head-eye input device interesting? YES/N0, explain:

All subjects answered YES. The predominant explanation was that it would be
“useful for handicapped users” (55% subjects) and 39% subjects said that it
would be “nice to integrate the interface as the general input”, either replacing

or working with the conventional mouse.

2. Did you like the head-eye interface? YES/NO, explain:

In this case, 83% subjects said that they liked the interface, and 17% said that
they did not like it. The subjects’ explanation was that they didn’t like it “at

its present stage”, since they had difﬁculties in the interaction.

3. Did you have any problems using the interface (e.g., not able to control the

pointer or click, neck pain or discomfort, or other)? YES/NO, explain:

Not all subjects speciﬁed problems: 89% said that they had problems, and
11% said that they had no problem. Out of the ones that speciﬁed them, 44%
had problems with cursor control, 37% had problems with clicking, and 25%
experienced some neck pain or discomfor. Some subjects reported more than
one problem. One subject commented that “it felt a bit strange to them to open
the mouth at the beginning”, but in time the subject got used to that fact and
learned how to Open the mouth for the system to recognize that. One subject

172

   
  
 

commen
t0 movlI

mow lll“

4. Do you ’7

 

Subject 8

int egr atii
have altt
their 1110

computt‘
be imprt

to type i

The results

itix'e. The prob

menter. and son

the second sessil

change in the m

Since nothing ha:

perception of I!

N?
W ' .
hilt performing t

The at

.JOW 8022

r“ J . .
Dubltm. their own

 

commented that he had problems with the cursor control since he “was not used

to moving his head while working with the computer and that he would rather

move his eyes only”.

4. Do you have any other comments or suggestions for head-eye interface?

Subjects suggested an alternative way of selection, e.g., through blinking, or
integrating with voice recognition. Two subjects said that it would be nice to
have alternative selection mechanisms, since the current one made them “keep
their mouth closed” and that they wanted to be able to talk while using the
computer. Most subjects suggested that the general control of the cursor should
be improved as well as the interface itself, e.g., to recognize the user’s intention

to type and not to move the cursor while the user is typing.

The results of the above exit survey showed that the general impression was pos-
itive. The problems such as adjustment to the interface were expected by the experi-
menter, and some of them were not present in the second session. For example, after
the second session, a number of subjects asked the experimenter “Did something
change in the meantime?”, since they found the interface easier the second time.
Since nothing had been changed in the interface we can conclude that the subject’s
perception of the interface changed: they got used to it, and didn’t feel that strange
while performing the tasks.

The above suggestions indicate that the subjects experienced the “Midas Touch”
problem: their own body was the interface and in some cases they were simply not

173

 

 

 

 

amme

hnwml

 

7.5 Die

The overall
News
improved
0f the nu
0r 4 titsl

new (‘_>\‘

tlie 811

able to adjust their behavior and movements to incorporate the interface needs. Over

the two sessions, most were able to adjust, however some could not adjust.

7.5 Discussion of T1, T2, T3, T5 Results

The overall performance in the individual tasks showed a trend of subject improve—
ment in time. In task 1, 72% improved; in task 2, 83% improved; in task 3, 67%
improved and in task 5, 61% improved. Overall improvement was measured in terms
of the number of tasks in which the subject improved: when a subject improved in 3
or 4 tasks, by deﬁnition the overall performance improved. A total of 61% of subjects
improved (28% in all the tasks, and 33% in 3 out of 4 tasks). The remaining 33% of

the subjects improved in two tasks only, and 6% (1 subject) improved in only 1 task.

The total time the subject spent in performing the tasks with the head-eye input
interface in the ﬁrst session was on average 27.8 minutes and in the second session it
was 24.6 minutes. The time needed to make the ﬁrst correct selection was 4.6 and
4.8 seconds, for the tasks T2 and T3. The selection times were similar in Task T5.
Subjects needed about 5, 10, and 6 times more time to perform tasks T2, T3, and T5,
respectively, using the head-eye input interface compared to the mouse. It should be

noted that a few subjects were able to perform Task T5 only about 2—3 times slower

than with the conventional mouse.
In the second session, for subjects 01, 02, 07, 10, 12, and 13, three blue dots
were tracked instead of the facial features. The tracking in the ﬁrst session was not

174

 

sunshlm

custhpu

 

hnnmuhu
ndﬂwyuhm
nueun\i
tadtvasln,
erperinien'
fJeus trust
The
apphcat
window
live :
n to 1

Conch

0 It

Flat

successful (for children) or the subjects could not see well without glasses. In all six

cases, the performance improved in the second session.

It should be noted that our child subjects had some difficulty in using the head-eye
input interface. The problems were due to their impatient and rapid head movements,
and they often moved a lot in the chair during the session. Thus, they would often
move out of the focus of the camera and would have to re-position. The net-surﬁng
task was boring for them and instead of asking them to perform the regular tasks the

experimenter instructed them to visit appropriate WWW sites for their age and to

focus more on the writing of e-mail messages to their parents.

The most important conclusion is that it was possible to control a real-world
application without any changes and adjustments to the interface. The Netscape
window we used, and the news WWW pages, had some rather small selection areas.
If we are to make a custom-built application suited to the interface, we can adjust
it to the special needs of the head-eye input interface. We believe that the same
conclusions could be made for an eye-based interface, since our head and eyes are
not always perfectly still, and due to our natural movements the interface is prone to

glitches. To the future designers of applications based on the head-eye input interface,

we would suggest the following:

o It is possible to select out of 10 x 12 icons on a 19” display. Selection time for
each icon is about 3—5 seconds or less for an experienced user. This timing is
higher than 1 second per selection, reported in many eye-mouse or head-mouse

175

 

 

hated st

USGIS pt.

‘ llll’ My
and am
his at
tions. ;
"Batik
%\\\\'\s
We r.

"Bl

'lt

' 1
CL
w]

6n

based studies, but we believe that the selection time could be reduced as the

users practice more.

0 The reasonable button size is about 1.40 of viewing angle (e.g. Netscape “Back”

 

and similar buttons). Our subjects had very little problem to select these but-
tons and they had to use them often. For example, for many erroneous selec-
tions, an unwanted WW’W page would load and subjects had to click on the

“Back” button, or a pop-up window with an error message would open and

 

subjects had to click on an “OK” or similar button. In both cases, the buttons
were of similar size. The number of erroneous selections while selecting the

“Back” button was very low, compared to the selections of other buttons.

0 It is possible to select smaller buttons (05“), however, the error rates increase as
the size decreases (e.g., adjacent menu buttons on a WWW page). When small

targets are isolated, it is possible to select them with ease and small number of

errors (e.g., the title of a story on a WWW page).

0 There should be a blank space between the adjacent target buttons that are

crucial for the control of the application. That would minimize the erroneous

selections.

o The users must have a way to signal to the interface to temporarily stop the
cursor control and have an easy way to re-start it. This option would be used
when the user wishes to type some text and the cursor should stay on the text
entry area regardless of where the user is actually looking.

176

 

0 Stltt‘tt.
in ex;
SUlljl‘t‘t

lit-Thin:

. ll hit ii,
Patter
('llair‘
View

the;

7.5.x
As “it
The ((3
the us
‘0 the
“hfiad.
Wen, tl

b9 abk

In C
the 11101

’6 lute

expldlfl (

0 Selection with a face expression is possible but can cause some problems if the
face expression determination algorithm is not working perfectly. When the
subject’s condition would allow, an alternative selection mechanism, such as

keyboard or other hardware button, or voice signal, should be used.

0 If the interface is to be used by children, it should incorporate their movement
pattern while working with the computer: children may not sit straight in the
chair, and might often lean towards the display, or to the chair sides. Thus, the
camera should not be focused on their face only, but on the wide area around

them. The interface should not be sensitive to their sudden moves.

7.5. 1 Learning curve

As with all new interfaces, the use of the head-eye input interface must be learned.
The idea of using one’s head as an interface is not very familiar to the average user, and
the user must constantly think about where and how to move the head. This is similar
to the hand-eye coordination in the case of the conventional mouse. We can call this
“head-mind coordination”. If the tracking and update of the screen coordinates works
well, the users should be able to learn this coordination fast. hr‘loreover, they should
be able to adjust to the interface so that they do not think about the coordination.
In our interaction with novice users we often compared the new interface with
the mouse. This metaphor helped the user to familiarize with the general idea of
the interface. A positive aspect of this comparison was that we did not have to
explain extensively what is the potential use of the interface, and how to use it, since

177

‘3 .'.' “-3- l-.-" if"!

 

most ptoph

the perform.
llutu‘er. th
thunder
must lhei
muvthwn
vthtuiv

aherthe

the need
the SQQQﬁ
inatrn;
Inouse
of the
net n

lab v

Ii

115-{TS

to

most people are nowadays familiar with the mouse. Also, we could easily compare

the performance of the new interface and with the mouse on some common tasks.
However, the ﬁrst drawback of the comparison was that the users implicitly expected
the same level of performance with the new interface that they were achieving with the
mouse. The important fact most users were forgetting was that they had to learn how

to use the mouse, and that there was some adjustment period before they were skilled
with the mouse. Thus, some users had a number of complaints about the interface
after the ﬁrst usage, and most complaints addressed the issue that they do not see
the need for the new interface when they can do much better with the mouse. After
the second usage of the new interface, when the users got more skilled, there were not
that many complaints and comparisons with the mouse. The second drawback of the
mouse metaphor was that the users would not think about other potential applications
of the interface, which extended well beyond the mouse capabilities. However, we did
not notice that the mouse metaphor blocked other ideas, and many novice users and

lab visitors would quickly point out other potential applications.

How well do the users learn the interface in practice? We observed a number of

users While using the interface, and there is a common pattern for everyone.

1. Novice level: In the ﬁrst few minutes of use all users perform rather poorly.
They make rapid head movements and observe how the cursor would move. As
the time passes, they get more skilled and learn how much they need to move
to produce the desired cursor movement.

178

 

l. .llter *(

and Ill

 

 

hire a

 

<\\\\

2. After the initial “warm-up” period, the users either get used to the interface
and their performance improves, or they still struggle with the interface and

have a hard time performing the tasks:

(a) Expert level: If the users switch to this level, they can use the interface
without any problems and they perform rather well, and we can compare

their performance with the use of the conventional mouse.

(b) Intermediate level: The users who stay at this level don’t get used to

 

the interface due to several reasons that range from the problems with
the underlying computer vision algorithms to the problems related to the

adjustment to the interface:

0 For some, the computer-vision-based tracking did not work well, and
the users were not able to get good response from the interface. For
example, users with glasses could not use them at present since the
tracking would fail. Thus, when they would need to read some de-
tailed information on the display they would lean closer or put on the
glasses, which would result in erroneous updates of the cursor posi-
tion. Then, they would need to re—position themselves in the chair
and to re-calibrate, which would result in several seconds (or tens of
seconds) of time lag, leading to user dissatisfaction. For such users, if
the tracking problem is solved, they can quickly adjust to the interface
and switch to the “expert level”. For example, to overcome the glasses
problem, we tracked three blue squares glued just above the eyebrows

179

Intermediate level

 

Figure 7.13: Learning levels for 18 subjects

and on the tip of the nose. The movements of the blue spots were

equivalent to the eyes-nose movements and the users had the feedback

as if the tracking worked well.

Skilled level: The tracking works, but they are not able to learn to
move slowly and to re-calibrate when needed. Those users would have
good cursor control at one point, but would often tilt too much, and
an eye would be occluded, or the mouth would not be visible and the
interface would not respond. They would not realize what is happen-
ing, and when instructed to move slowly and re-calibrate, they would
start to rapidly move their head in all directions and would think that
they did re—calibrate, but in reality they would just cause more confu-
sion to the interface. Either they did not understand the instructions
to move slowly or did not want to follow the instructions. In both
cases, the results would be that they would not be able to control the
cursor movement at all. In time, they would gradually start moving
slower and be able to control the cursor. However, they would never

get even close to the performance of the expert users.

180

 

 

two one-hunt

 

tasks usiugt

it sk'\\\\\s\<

Ha.—

lerel and

based on

7.6

In th’
wide
CUISC
writii
U116 it
We ('0
repeat
Were a!

m the

l0 €ij

The observed learning pattern is similar to the learning pattern of any new com-
puter input device. Some users will simply never adjust to the interface and some will
pick it up right away. In our experiments, we attempted to test the learning curve for
the head—eye input interface. Our subjects attended an experiment that consisted of
two one-hour sessions, approximately one week apart, where they performed various
tasks using the head-eye input device. Figure 7.13 shows the breakup of the subjects
in skill levels. A total of 18 subjects participated in the experiment; 13 reached expert
level and 5 reached skilled level. The breakup in the skill levels is done by the author

based on the observance of the experiments and the performance results.

7 .6 Summary

In this Chapter we presented the results of a study with 18 subjects who performed a
wide range of tasks using our head-eye input interface. The tasks involved moving the
cursor along a path, selecting buttons, dragging icons, and surﬁng the internet and
writing an e—mail message. All subjects participated in two sessions approximately
one week apart. The performance on each task was measured and based on that
we concluded that 11 out of 18 subjects (61%) improved their performance in the
repeated usage of the interface. The most important fact was that most subjects
were able to control the real-world application such as Netscape without any change
to the application, and that in the second session only one subject was not able
to complete the task. We believe that if our subjects persisted using the interface,

181

 

 

they would

comparable

 

 

they would get to be skilled users and their performance with the interface could be

comparable with performance using other input devices (e.g., the mouse).

 

 

182

 

 

 

Chapi

Concl

 

ln this dissel't
sented. The i:
user’s head an
completely not
sults of featun
that the featu
pixel image.

tificial Neural
movements w.
ranging from
head mmeme

experiments it

Chapter 8

Conclusion and Future Work

In this dissertation, an interface for handless human-computer interaction was pre-
sented. The interface is based on the use of computer vision algorithms to track a
user’s head and facial features such as the eyes, nose and eyebrows. The interface is
completely non-intrusive, and requires only an camera attached to the computer. Re-
sults of feature tracking were presented and accuracy was evaluated, and it was shown
that the features could be tracked on average with 2 pixels accuracy in a 320 x 240
pixel image. A framework for gaze direction detection based on the use of an Ar-
tiﬁcial Neural Network and feature image coordinates movement scaling to display
movements was presented. The framework can be used in a number of applications
ranging from measuring user’s focus of attention to controlling the computer using
head movements. Preliminary results of attention measurement and cursor control
experiments were presented.

The gaze detection algorithm was used along with the face expression detection
(open vs. closed mouth) to develop a handless human-computer interaction interface.

183

 

 

 

The interfa:
week apart.
button nunr
that the user
I
was selectalij
number of a;
movement p:
selecting butt

Our subjects

 

They Spent in

showed impro

8-1 Re:

0 A facial
geometrj
using get
componei
successful
If the per:

does “OI w

. .
A“ algoritl.

‘1 dark-

Skin

The interface was evaluated by 18 novice users in two sessions approximately one
week apart. In terms of the usability of the interface, the appropriate selectable
button number, size and configuration were investigated. Our experiments showed

that the users can select buttons of 0.50 with a lot of errors, while button size of 1.30

was selectable with reasonable error rates. The selection time was 3—5 seconds. A

 

number of applications of the interface were evaluated: monitoring user’s gaze and

movement pattern, moving the cursor along a predetermined path on the display,

 

selecting buttons, dragging icons, and controlling a real-world application (Netscape).
Our subjects learned how to successfully use the interface after moderate training.

They spent in total about 1 hour in active use of the interface and most of the subjects

showed improvment in the performance.

8.1 Research Contributions

0 A facial feature location method was presented based on the knowledge of the
geometry of the face. The method searches for the eyes, eyebrows and nose

using geometrical constraints, and the knowledge of the face colors. The red

component of the color image is used to search for eyes. The method was

successfully tested on a number of faces in both still images and video sequences.

If the person wears glasses or has beard or mustaches, the method sometimes
does not work well.
An algorithm for detecting dark skin and a method for adopting the image of

a dark-skin person so that the skin color detection algorithm can ﬁnd the dark

184

 

 

 

 

 

 

 

skin i

mot it

0 Case

initial
I

playi- _

plied l

 

flexibil

 

head In

. AIUHlﬁ
Studies
Offer 11

Presen‘

. Et’aluq
user 19é
18 subj
Can get
are halt

SPIeCled

rather (it;

Was 3 5 S

skin was presented. The algorithm works automatically in conjunction with the

motion detection, and automatically ﬁnds the adjustment coefficients.

0 Gaze direction detection that requires no calibration was presented. The user’s
initial gaze is detected using an Artiﬁcial Neural Network, and then the dis-
play/control gain factor logic, used for pointing devices like the mouse, is ap-
plied for the gaze calculation. This method offers adjustable gain factor, and
flexibility for individual users. The user could set-up the interface so that the

head movements are rather small.

0 Monitoring user’s gaze in time enables applications such as visual perception
studies, attention measurement, or marketing research. This interface could
offer non-intrusive monitoring in a more natural setting than what is done at

present.

0 Evaluation of the interface in terms of the usability, achievable resolution, and
user learning pace and performance was presented. The results of a study with
18 subjects, who performed a wide range of tasks, show how well novice subjects
can get used to the interface, which tasks can be performed with ease and which
are hard. VVe showed that a target as small as 0.360 of viewing angle can be
selected with the interface, and that a target size of 1.40 of viewing angle was
rather comfortable for the users to select. The time needed to make a selection
was 3—5 seconds.

185

 

 

o Succe-
Wllllt’r
and n
to lea:

pointt

 

 

an en

 

averag
shower i
author

complm

 

and in

achieve

8.2 F1,

The interface
this Work, We
directlohs in I

will One (lav m

the list) Of

WOUld alltf)

0 Successful control of a real-world application using the head-eye input interface
without. any changes of the application itself was achieved. Netscape navigator
and mail programs were used as the test programs and our subjects were able
to learn how to use the system and to surf the network using their head as a

pointer. They were able to browse several VVW'W sites successfully and to write

an e-mail message, after moderate training with the interface of, 28 minutes
average in the ﬁrst session, and 16 minutes in the second session. The results
showed that they would be able to become very skilled in the interface. The
author of the interface, who can be considered an expert user, for example,
completed the netscape surﬁng task in 2:40 minutes using the head-eye input,
and in 1:15 minutes using the conventional mouse. Five other subjects (28%)

achieved performance approaching the author’s.

8.2 Future Work

The interface presented in this dissertation can be used in various applications. In
this work, we investigated some basic tasks. In this section we list the possible future
directions in the research of the head-eye input interface. We hope that the interface

will one day become widely used and will improve human interaction with computers.

0 Improvement of the feature detecting algorithm in terms of the robustness to
the use of glasses, and presence of beard or mustaches. Such an improvement
would allow wide range of users to use the interface without any constraints.

186

 

 

 

 

0 [Seth
leatur
vvaj'tl
rnonio
Altern.

and at:

 

Applitul

0 lrrij)rt)\t

 

0n Opih
user ccn

fashion

° EValtlat
Olsonn
USE‘TS \\

beneﬁt

. Improve
Su(}1 as
“)deth

an)" env-i

0 Use of rrrore than one video camera, that would all attempt to ﬁnd the tracked
features. The best results would be used as the output of the system. That
way the user would be able to move freely in the workspace and the computer
monitor would not need to be the only device that would be controlled.
Alternatively, an omnidirectional camera could locate the faces and track them,
and another pan-tilt-zoom camera could zoom in to get a better face picture.

Applications such as teleconferencing would beneﬁt greatly.

0 Improvement of the UI to have an easily selectable turn-off and turn-back-
on option, that would be used when the user is just typing or reading. The

user could then use the interface along with other input interfaces in an easier

fashion.

0 Evaluation of the interface for physically challenged users, and customization
of some procedures, depending on the special user needs. We believe that the

users whose only way to control the computer is through head motion would

beneﬁt greatly.

0 Improvement of the algorithms in terms of the robustness to the environment,

such as presence of skin-color tones in the background or changing lighting
conditions. If a fully robust algorithm is devised the interface could be used in

any environment, e.g., airports, museums, train stations, etc.

187

 

 

 

APPENDICES

 

188

 

In this

Perimet
of the y
3 differe
been de
graph Us

Table

Appendix A

Results of the Gaze Monitoring

Experiment

In this Appendix, the results of the gaze monitoring and ﬁxations determination ex-
periment are included. The experiment has been conducted adjunct to the evaluation
of the head-eye input device described in Chapter 7. A total of 18 subjects viewed
3 different images for 30 seconds each, and the gaze path and ﬁxation locations have
been determined automatically using the algorithm in Figure 5.6. The ﬁrst photo-
graph used is from “50 Favorite Rooms by Frank Lloyd Wright” by Diane Maddez, and

the remaining two photographs are from the WWW site http://www.photo .net/.

Table A.1 includes number of ﬁxations, average, minimum and maximum ﬁxation
duration for each of 18 subjects. Figures A.1 through A.9 give gaze path and ﬁxa-
tion points for each of three images. Gaze path and ﬁxations are superimposed on

the image that the subject viewed. The white line represents the gaze path. Cray

189

 

Sill);

H)

...,——
{——

200i
200.
200:
200:
200-
200.?
217w )t
200'.
200.5
200‘.
20M
201]
2012
2013
201.5
2016
2017
2018.

Table A;
algOrithn

dOts ShOh

Conﬂeets

subject # of ﬁxations, avg. min. & max. duration (msec)

ID image 0 image 1 image 2

2000 53 864 112 3800 57 734 115 2401 54 789 140 2991
2001 23 558 123 1690 27 447 115 1255 26 310 102 851
2002 23 361 117 1544 16 580 117 2669 17 341 107 939
2003 31 547 116 2394 27 300 121 623 31 316 115 839
2004 31 398 109 2097 44 433 108 1438 40 411 111 1306
2005 18 218 100 438 24 284 104 1083 31 252 30 761
2006 30 441 102 1386 11 239 109 565 31 291 105 882
2007 34 455 79 2022 31 378 117 1150 13 268 116 876
2008 39 490 102 1729 36 286 101 662 38 430 126 1426
2009 37 322 101 963 33 345 100 1181 27 238 97 595
2010 19 964 236 3118 28 783 220 4073 20 571 123 1535
2011 46 476 109 1251 45 544 110 1504 50 413 72 1727
2012 19 294 110 885 33 234 41 573 23 250 100 660
2013 18 364 117 816 35 311 104 1042 27 339 102 922
2015 41 510 102 1529 24 390 122 1015 23 356 125 1471
2016 40 509 100 1376 31 354 115 1167 35 469 117 1905
2017 24 257 110 835 13 226 104 593 41 312 101 943
2018 32 761 108 2711 43 461 105 1305 35 465 102 1670

 

 

 

 

Table A.1: Number and durations of ﬁxations measured by the gaze-determination
algorithm

dots show ﬁxation locations, and the dot size indicates ﬁxation duration. Black line

connects consecutive ﬁxation locations.

190

 

gi§.1 “E It.

 

 

 

 

subj-2000 subj-2001

Figure A.1: Gaze path and ﬁxation points, subj-2000 and subj-2001, images 0, 1, 2

191

     

subj-2002 subj-2003

Figure A.2: Gaze path and ﬁxation points, subj-2002 and subj-2003, images 0, 1, 2

192

 

     

subj—2004 subj-2005

Figure A.3: Gaze path and ﬁxation points, subj-2004 and subj-2005, images 0, 1, 2

193

f
-T\

subj-2006 sub j-2007

 

Figure A.4: Gaze path and ﬁxation points, subj-2006 and subj-2007, images 0, 1, 2

194

 

sub j-2008 sub j-2009

Figure A.5: Gaze path and ﬁxation points, subj-2008 and subj-2009, images 0, 1, 2

195

 

subj-2010 subj-2011

Figure A.6: Gaze path and ﬁxation points, subj—2010 and subj—2011, images 0, 1, 2

196

 

subj—2012 subj-2013

Figure A.7: Gaze path and ﬁxation points, subj-2012 and subj-2013, images 0, 1, 2

197

 

subj-2015 subj—2016

Figure A.8: Gaze path and ﬁxation points, subj-2015 and subj-2016, images 0, 1, 2

198

 

subj-2017 subj-2018

Figure A.9: Gaze path and ﬁxation points, subj—2017 and subj-2018, mages 0, 1, 2

199

Appendix B

Task T1 Complete Results

Table B.1 gives the squared error from the target curve for each subject and curve,
for the two sessions done one week appart. The screen coordinates were normalized
to 0—1 range. Figures B.1 through B.19 give the comparison of the target curve and

the cursor path shapes in the two sessions, for each subject.

200

 

 

 

 

 

 

 

 

 

Subj. Session 1, averaged squared error Session 2, averaged squared error

ID c0 c1 c2 I sum #pnt c0 c1 c2 I sum #pnt
00 0.0787 0.0447 0.0578 0.1812 3563 0.0325 0.0266 0.0569 0.1160 4139
01 0.0579 0.0349 0.0594 0.1522 3213 0.0360 0.0456 0.0588 0.1404 2778
02 - 0.1053 0.1248 0.0708 0.3009 2986 0.0886 0.0767 0.0776 0.2429 3257
03 0.0620 0.0577 0.0596 0.1793 4279 0.0781 0.0622 0.0541 0.1944 3420
04 0.0491 0.0321 0.0770 0.1582 3571 0.0705 0.0587 0.0639 0.1931 2884
05 0.1000 0.1514 0.0557 0.3071I 3064 0.1081 0.0783 0.0603 0.2467 2197
06 0.0627 0.0493 0.0840 0.1960 3809 0.0414 0.0766 0.0575 0.1755 2644
07 0.0695 0.1337 0.0679 0.2729 3205 0.0679 0.1226 0.0603 0.2508 1546
08 0.0778 0.0691 0.0795 0.2264 3314 0.0616 0.0574 0.0637 0.1827 3786
09 0.0344 0.0467 0.0553 0.13641 4356 0.0345 0.0311 0.0443 0.1099 3411
10 0.1501 0.0831 0.0683 0.3015 2674 0.0872 0.0689 0.0588 0.2149 2722
11 0.0718 0.0818 0.0690 0.2226 3204 0.0495 0.0422 0.0601 0.1518 3980
12 0.0902 0.1366 0.0553 0.2821 3098 0.0254 0.0263 0.0566 0.1083l 3200
13 0.0612 0.0477 0.0565 0.1654 3924 0.0670 0.1054 0.0724 0.2448 2444
15 0.0363 0.0377 0.0661 0.1401 3520 0.0420 0.0412 0.0474 0.1306 2346
16 0.0563 0.0439 0.0705 0.1707 4908 0.1313 0.0825 0.0719 0.2857I 2615
17 0.0672 0.0712 0.0622 0.2006 4159 0.0485 0.0994 0.0574 0.2053 2755
18 0.0715 0.0926 0.0596 0.2237 3242 0.0377 0.0700 0.0586 0.1663 3073
20 subject is the author 0.0281 0.0213 0.0584 0.1078 3807
Sum 1.3020 1.3390 1.1745 3.8173 64089 1.1078 1.1717 1.0806 3.3601 53197
Avg. 0.0723 0.0744 0.0653 0.2121 3560 0.0615 0.0651 0.0600 0.1867 2955

Percentage of improvement: 15% 12% 8% 12%

 

 

j - minimal overall squared error value in each session
I - maximal overall squared error value in each session

 

Table B.1: Task T1: average squared error for each subject

201

Screen coord. in time. cum .0 rouowmg. 30011000. wosqenor 0.070727 Susan coord. In time cum .0 Wig. mm. manure! 0.032405

 

 

 

   
 

 

 

 

 

o 0.2 0.4 0.3 on 1
x woo" Cm

Sam coord. in time: curve 01 um. sunLaooo, manna! 0.026666

 

 

Task curve #—

V screen emanate
.0
O

 

 

 

 

 

 

0 0.2 0. 4 0.6 0.8 1 0
X 3670." coordinate

 

Semen coord. In limo: cum '2 Ionovllno. 30011000. avasqenor 0.057774 Screen coord. In time: curve 02 Mowing. sunLaooo, monomer 0.066822

 

 

Task curve —

Y screen mm“
.0
O!

 

 

 

 

 

 

 

 

0 02 0.4 0.0 0.5 I
X screen coordnalo

Figure 3.1: Task T1, subject 00, cursor paths for all three curves

202

V ICIW‘ (3me

Y screen coordule

Screen coord. in m. curve 00 renew-no. subLZOO1. Ivo.:q.error 0 057676

 

 

 

 

 

 

 

O 0.2 0,4 0.6 0.8 1
X screen cooninm
Screen ooord In Mme: curve l1 rationing. subLZOOI. masqerror 0.084944
Task curve 7 ,

 

 

 

 

 

 

 

 

Tm curve ,- —
1 me ——
sun point -
End poi -
l
M 2
on
0.4
02
c
0 02 0.4 0.6 0.! 1
X screen mann-

V screen coordham

Y screen coordinate

V 312100" Wm”

 

 

 

 

 

 

 

 

0.4 0.6
X ”.9" manur-

OJ 1

Screen coord. in m: cum I! Who. “3001.2vgsamror 0.045627

 

 

 

 

 

 

 

 

 

 

0.4 0,8
x screen coorulnare

Figure B.2: Task T1, subject 01, cursor paths for all three curves

203

Screen coon! in lime lame to following. subL2002. “9.50.9170! 0 105301 Screen word in lime curve 00 «allowing. subLSOOZ, m.eq.error 0.05mi

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure B.3: Task T1, subject 02, cursor paths for all three curves

204

Y screen coordheln

V lemon commune

V screen couture

 

 

Screen coord. In time curve 00 IoIIowIng. wagons. avaisqenor 0.062047

 

   

\\ " -
¥ «p-wily .
O $344.4» .
"‘.é V / ..

   

   
  
 

~97
A .

   

 

 

 

 

 

 

0.5 I

 

 

 

 

V screen coormaw

V 56700" denlh

V screen oooumare

Screen court In Ilme: curve IO IoIIoMng. wam. mammal 0,0780”

 

 

 

 

 

 

 

 

 

 

 

 

 

0.4 05
x screen coordlnale

0.! 1

Figure B.4: Task T1, subject 03, cursor paths for all three curves

V WOO" deﬂl‘l

0.6

.0
.

 

 

 

 

 

 

0 02 0.4 0.8 0.3 I
X screen coordinate

Screen coord. in m. curve 01 IoIIoMno. suDL2004. avgaqmror 0.032I16

 

Task curve -. —
ear I

 

 

 

 

0 02 0.4 0.8 0.! 1
X ecreen coordnare

Screen coord. In no: curve I2 IoIIoMna. “1112004. avq.eq.errcr 0.077051
Task curve 7

 

 

 

 

O 02 0.4 0. 6 0‘ 8 I
X screen coordnaIe

 

 

Y 561000 coordnale

 

 

 

VICIWWO’IID

 

 

 

 

Screen coord. In lime: curve 02 Iouowlng. 31113004, macerror 0.00385

 

Task curve ~ 7

Y screen emanate

 

 

 

 

Figure B.5: Task T1, subject 04, cursor paths for all three curves

206

V screen coordinate

V screen 000ml”

V screen convﬁrele

Screen coon: In line. curve 00 renewing. 50mm. avgsqerror 0.100000 Screen coon! In time: curve 00 IcIIoMng. subLSM. avg.eq.error 0.10012!

 

 

Teek curve — Teen curve H
1 Ueer data 7
Sran poinI -
90'" '

V screen coordInaIe

 

 

 

 

 

 

 

0 0.2 0,4 06 0.8 I
X screen emanate

Screen coord. In m: curve 0‘ IoIIowing.sub1_2005. avesqerror 0.151451

 

 

Y screen coorrhate

 

 

 

 

 

 

 

0 02 0.4 0.0 0.3 1
X screen coordnare

ﬁmnn um en” 1m:

 

 

V screen CW"

 

 

 

 

 

 

   

Figure B.6: Task T1, subject 05, cursor paths for all three curves

207

V screen coordnare

Y screen coordInaIe

V screen coorarab

Screen coord. In Irne. curve 00 lemming, sub;_2006. avgsqenor 0.062577

 

Task GUNS .7.

 

 

 

 

O 0.2 O. 4 0.6 0‘! I
x screen coorﬂrnsle

Screen count 10 time curve 01 renewing, woLzoos. avosqerror 0.049016

 

 

 

 

 

 

 

 

 

 

Qty-mm ' ' nun-55008.

 

V screen coorrlnale

 

 

 

 

0 0.2 0.4 0.6 0.8 I
X W00“ 13me

Screen coord. in time: curve I1 IoIIowing. M3006. avenerror omeeos

 

Vscreencoormale

 

 

 

 

 

Y EM main

 

 

 

 

Figure B.7: Task T1, subject 06, cursor paths for all three curves

208

V screen coordinale

 

1 (cf—._m Userdara A.
, ’ \\ Sm-

Scrcencooru' '

vwaurr

 

 

 

 

0 0.2 0.4 0 6 0.0 1
X screen coordnete

Screen coon: in time curve 01 following. “112007. avg.sq.error 0.133682

 

Task curve v‘

    

 

 

 

0.4
02 ‘/
' (.
\‘\
o as
o 02 0,4 0 0a 1

Vmcmuh
O
C

SDI-an mt! euhi 9007 0.0679‘2

 

Task curve F
User can ¥
Stan porn!

End point

 

 

 

 

O 0.2 O 4 0 6 0.8 1

X 50180" coorolnale

Y 50100" couture

Y screen coord‘lah

Screen coord. In IIme curve '0 renewing. subL3007. mseencr 0.067957

 

 

 

 

 

0 0.2 0.4 0.6 0.!
X $7000 Cmﬂib

Screen coord. In time: CW. '1 W110, “113“”. ”9341."!!! 0.1m

 

T“ CW. —

 

 

 

 

 

 

 

 

 

O 0 .2 0.4 0.6 0.8
X 36790" cocrrlnaIe

Figure B.8: Task T 1, subject 07, cursor paths for all three curves

209

 

Y screen cooranale

 

um mm
'- .—

 

Task curve , *#
ser dam W

 

Screen coord. In rIme: curve 41 ram. suanooe. avgseerror 0.089075

 

Task curve »

 

 

 

 

Screen coord. in m: curve :2 Memo. subLZOOI. avo.sq.errcr 0.07m

 

Task curve _-
éJeer Ia —

 

 

Xscreen coordinate

Y screen coordnaIe

V screen comma-

Y .6700" cooranale

 

 

ﬁcmmmm ’ ' _ euﬂh‘lm

 

Tak curve 7
dare

 

 

 

 

 

 

0.4 0.8 0.8 1
X screen coordinate

Screen word in lime; curve 02 Idloerlng. subL3000. avgsqerror 0.003740

 

User cars

Yak curve ~ ——
Start l -

 

 

 

0.8 1

Figure B.9: Task T1, subject 08, cursor paths for all three curves

Y screen WWII“

V screen coorrﬁnale

v screen cooranale

 

 

 

 

 

 

 

 

 

 

 

 

 

0 0.2 04 06

 

 

 

 

 

. . 0.8 1
X screen cooraInaIe
m and euhl mm 0055352
TM GUN! —
1 USU duh ﬂ
Sm pd -
End
0.8
0.8
0.4 ‘
v
02 i
0
0 0.2 0.4 0 G 0.0 1

X saeen ooordnale

V WOOD WWI“!

V screen ooordInaIe

v screen coordnaIe

 

 

Screen coord. In time: curve 00 renewing. subLaOOO. lvg.eq.error 01134467

 

 

 

0 0.2 0.4 0.6 0.8 1
X ”00" coordinale

Screen coord. In time: curve 01 renewing. suuLaooe. evoseerror 01131075

 

 

 

 

 

Scrmmvr!‘ '

 

 

 

 

Figure B.10: Task T1, subject 09, cursor paths for all three curves

Screen cooru In lime curve 00 renew-n9. subL2010r avu.sq.enor 0 150077 Screen and ' ‘ ' _ «Ir-Limo

 

 

Task curve 7 Task curve . 7
. User cara

Y screen coordinate
Y screen coorttIaIe

 

 

 

   

 

 

 

 

0 0.2 0.4 0.8 0.8 1 0 0.2 0.4 0.6 0.0 1
X screen cooninale X screen coordnale
Screen coord. In lime: curve M IoIIovnng. subLZOIO. avgsqerror 0.083080 Screen coord. In IIme: curve 01 IoIIowIng. subLSOIO. avg.sq.error 0.080900
Task curve ~ 7 —

    

 

 

 

 

 

 

 

 

 

1 . 7 A s - User den 7
/ ~ ~ . .. m -
-1
0.0
2 E
g 0.6 E
0
8 0-4 3
> >
0.2
o . -
0 0.2 0.4 0.6 0.0 1
x screen coordnare
Screen coord. in time; curve 02 IoIIowIng. suoLZOIO. avgsqerror nooseso Screen coord. ‘ _ mhl_3010, _ . , "
Task curve 7 Task curve 7
7 1 User data 7
Stan point
.\ d
0.6 '

 

 

 

 

 

 

0.4 ,
f
. \ 7
0.2 \
. " o
0 0.2 OJ 0.6 0.0 1 0 0.2 0.0 0.6 0.6 1
X screen coorvlnale X screen coordnare

Figure 8.11: Task T1, subject 10, cursor paths for all three curves

212

V screen coordnale

V screen coonhale

Y screen coormale

 

 

 

Screen word In lime: curve IO Wm. eubL301 I. avgsqerror 0049521

 

 

 

 

 

 

 

 

Task curve 77
1 User data —
Stan poinl -
End point -
0.8
E
3 0.5 ,' ,
u r
i l
8 0.4 .
> xx , , _
0.2 \/—/
0
0 02 0.4 016 0‘3 1

X ecreen ooordhau

Screen coord. In Ilme: curve 01 renewing. eurLaO‘l‘l. augment: 0.04m

 

 

Y screen «name
9
or

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

sum mm: . u - mu 9m 1 0.080060
Task curve
User data
Sm point
End t
8
( g
\
1 7 u
E
>
I
0 02 0.4 0.6 0 l 1
x ween coordnme

 

Figure B.12: Task T1, subject 11, cursor paths for all three curves

213

Y screen coordnahe

Y screen coon-hare

 

Screen 000m in rune. curve IO lollounna, subL2012. avgisqenor 0090214 Screen coerce in u’mu ' . :uhLamz,

 

 

Task CUWO 7

Y screen coordinate

 

 

 

 

 

 

 

Screen coorri In time: curve 01 renewing. sueL3012. murmur 0.026274

 

 

Task curve 7
User dam 7
Sun point -

pd!“ -

  

Y screen coorthare
p
O

 

 

 

 

 

0‘2
0 ,
0 02 0.4 0,6 0 8 1
X screen coordnate
Screen comm ' ' :uhl 1n19 WNW-

 

 

 

 

Y screen coordnare

 

 

 

 

 

 

 

0 0.2 OJ 0.8 0.8 1
x ween cooransre

 

Figure B.13: Task T1, subject 12, cursor paths for all three curves

214

3
§
g on
E
8
>

02

   

Screen coord ‘ , Int-L201; _ , ...w a... Screen cooni in lime' curve 00 ioilowino, subL3013. avgsqerror 0101541

 

 

 

User data 7
Stan porni
End pornr

Task curve 7 7 Taskcurve 777
/’

 

 

 

 

 

 

 

0 02 0.4 0.6 0.! 1
X screen m

Screen count in m- curve r1 ieiiowlng. eubL2013, avu.sq.error 0.047564

 

 

 

 

 

   

 

 

 

 

 

 

O 0.2 0.4 0.6 0.3 1 0 0,2 0.4 0.6 0.6 1
X screen coerﬂnare X screen coordinate
Screen word ' ' euhLQOI’l , , ., ,iw ,v Screen (zoom ‘ euhi 1011 0072‘”
Task curve 77 7 Task curve 777
User data 7 7 1 User date 7
Stan point Stan poinl
End point _ -

    
  
  
 
    
 

  

\

\‘Yb ,- ‘\

"SW"

‘3; r.
'r

V screen coordhaie
_o
O

   

 

 

 

 

 

 

‘ 0.2 "
0 K
0 0 2 0 I 0.6 0 8 1 O 02 0.4 0 6 0 e I
X screen coordinate X screen coordnare

Figure B.14: Task T1, subject 13, cursor paths for all three curves

215

V screen coordinate

Y screen coormaie

 

 

Screen cocni in time curve 00 ioiiawinq. subLZOTS, avgsqvener 0086270

 

 

 

 

0 0.2 0 4 0.6

_ 0,0 1
X screen coorarme

Screen coord. In time: curve N following. suuL2015, avqsqenor 0 037670

 

Task curve 7

a.“ Userdaia 7
unpoinr -
.

 

 

 

 

 

 

 

Screen and gun 7n1'.
Task curve
Ueerdan 7
Stan point
\ poini
1
/\
l
0 0.2 0.4 0.6 0.6 1
X screen coordinate

VI screen coordnau

Y screen coerdnare

Y screen coorhle

 

Screen coord. in am curve 0010Mowing. subLaois, avasqenor 0.041964

 

 

 

 

04 08

. . 0.8 1
X screen coordinate

Screen coord. in urne; curve 01 renew-no. subLSOI 5. evo.eq.eiror 0.001237

 

 

 

 

 

 

 

 

\ 1 , .
0 Q2 0.4 0.6 0‘0 1
X screen WHEN

Sanenmotd euhi 1015 '
Taek curve 7 7 77

 

 

 

 

Figure B.15: Task T1, subject 15, cursor paths for all three curves

216

 

Y screen coordinate

Y screen coordinate

Y screen coordinate

08

0.6

04

0.2 -

08 r

0.6

0.2

 

Screen word in time curve 00 ioiiowing, subL2016, avgsq error 0 056320

 

Task curve -
User data 7—

r , 1

 

 

0 02 0 4 O 6 0.8 1
x screen coordinate

Screen coord in time curve .1 iollowing. subj_2016. avg sq error 0.043933

 

 

    
   

 

 

 

Y 7 Task curve
User data 7 4
Start point -
End pornt -
P 1
0 0.2 0.4 06 0.8 1

X $619811 coordinate

Screen coord in time curve 82 following. subLZOtG. avgsqerror 0070508

 

 

 

 

 

 

ﬂ Task curve 7 7 -
p User data r
Start point -
/\ End poo"! -
\ {n}
| V \.
/\“
p
0 0'2 0 4 0,6 0 8 1

X screen coordinate

Y screen coordinate

0.8

0.6

Y screen coordinate

0.2

0.8

Y screen coordinate

0.2

04»

0.6 r

0.4 L

Screen coord. in time: curve to following, subLSOiG. avgsqerror 0 131280

 

 

 

Task curve -- 77
User data 7 r

-”-”_ End poi -
/

I

 

A

 

0.4 0.6 0.8 1
X screen coordinate

Screen coord. in time curve 01 iotiowino. 5013016. avgsqorror 0.082469

 

 

 

Task curve
User data -
Start point -
End pornt -

 

A d

 

 

0.4 0.6 0.8 1
x screen coordinate

Screen coord. in time: curve 02 following, subL3016. avosqerror 0 071872

 

 

t

 

Task curve ~77
User data '7 4
Start point
En f?

  

 

A A

 

0.2

0.4 0.6 0.8 1
X screen coordinate

Figure B.16: Task T1, subject 16, cursor paths for all three curves

217

Y screen coorrheb

Y screen coorﬁme

0.2

 

Screen coord. in lme curve 00 following. subLZOfI avgsqenor 0.067211

 

 

 

 

0.4 0.8
X screen coordinate

Screen ooora in time' curve 01 following. suh1_2017. avgsqerror 0 071150

 

 

Task curve 7

 

 

 

Screen coord. In time: curve '2 renewing. eutJL2017, evgleqerror 0.062245

 

 
    

 

 

Task curve —— -
User data 7 7
Start point -
rid point -

 

 

0.! 1

Y screen coorolriate

 

Screen coord. In time: curve .0 following. wbj_3017. avgsqerror 0.040518

 

Task curve 7 —7

 

 

 

04 06

, . 0.6 1
X ween coordinate

Screen coord. In time curve 01 following. subL3017. avguerrer 0.099429

 

    

 

 

 

 

0.8
9
II
8
g 0.5
8 0.4
>
02
o ‘ 7 a ,
0 0.2 0.0 0.6 0.0 1
X screen coordinate
ermn mm mm 1011 0 057450
Task curve 7 7
1 User data 7

Y screen coordnata
9
0|

 

 

Start point
End - ~

 

 

0.8 I

Figure B.17: Task T1, subject 17, cursor paths for all three curves

218

Y screen cooranate

Y ”’90" coortlnate

 

 

 

Screen ooord ln time , :unhmm 0.0715“

 

Task curve ’
1 User data
Start potnl
End - .

 

 

 

0.4 0.6
X screen coordmale

Screen ooord In time: curve It following. subLzow. avgeq‘error 0.092592

 

 

 

 

0 0.2 0.4 0 6
X screen coordinate

Screen coord. in time. curve 02 following. th2010. avgwerror 0.050000

 

 

 

 

Task curve 7-—
1 Ueerdata ﬁ
Stan point -
End point -
\ ,
0.8
0.6 7
0.4
02 1
‘\
0
0 0.2 0.4 . 0.0 1
X screen coordinate

V $709" coordnate

Y screen coordinate

Y screen coordnate

 

ﬁanen mad euhL‘lm A ,w

 

 

 

 

06

. . 0.0 1
X screen coordinate

Screen coord. in time: curve It followtng. suhLSOls. avgsqrerrdr 0.069964

 

 

 

 

 

0 0.2 0 4 0,! I

. .6
X screen coordinate

' nu” 'IDI I! ...,-am

 

 

 

 

 

. 6
X screen coordinate

Figure B.18: Task T1, subject 18, cursor paths for all three curves

219

Screen coord. in time curve 00 following. subLZO20. avgsqerror 0 028092

 

f

 

 

 

 

 

 

 

 

 

 

 

 

i TaSk Curve 7 77 7
1 User data 4
l Start pornt -
l End pomt -
2
a
E
D
E
c .
e \
e r.‘ ‘1 . ,_
3 - ‘ :
o l A A i i
0 0.2 04 0.6 0.8 1
X screen c00rdinate
Screen coord. in time. curve It followrng. subL2020. avgsqerror 0.021313
T ' Task curve
1 t 4
0.8 r
g .
(I
.E
D
g 0.6»
U
C
O
9
8 0.4 >
>.
0.2 >
o A
O 0.2 0.4 0.6 0.3 1
Xscreen coordnate
SCreen coord. in time curve 02 following. subL2020. avg.sq.error 0.058447
' 7 Task curve -
1 0 User data 7777 4
Start point -
End point ﬂ
0.8 r
2 .
a .'
s t , .. _ , ,
g 05 »\ ) . ﬂ . r
U ‘ .
5; 0.4 » i ‘ <
> ii i <
’ V \l
0.2 » é
o A— A—

 

 

 

0 0.2 0 4 0.6 0.8 1
X screen coordinate

Figure B.19: Task T1, subject 20, cursor paths for all three curves

220

Appendix C

Task T2 Complete Results

Table C.1 gives the button selection accuracy. For each session, the ﬁrst three columns
correspond to the ﬁrst, second and third correct selection. The fourth column gives
the number of incorrectly selected buttons in the third attempt. The remaining three
columns give the average squared error from the linear regression line by F itts’ law.

Table C2 gives the selection time statistics. The ﬁrst four columns for each session
give the average total time needed to select a button in the ﬁrst, second, and third
attempt, and the time when after the third attempt the selection was incorrect. The
average time generally increases with the number of attempts, which is expected,
since the user needs to make more than one selection. Also shown in the Table are
total elapsed times for each trial and for all the three trials. As the trials progress,
the elapsed times didn’t decrease. However, when compared across the sessions, both
the individual trial times and the total time decreased signiﬁcantly.

Figures C.1 through 019 give the selection times vs. button distance by Fitts’

law and the cursor paths for each target button for all 18 subjects and the author.

221

 

 

 

 

 

 

Session 1 Session 2
Selection Avg.sq.err of lin. Selection Avg.sq.err of lin.

Sub j correctness regression line correctness regression line
ID 1 2 3 not E0 E172 M 1 2 3 not E0 T172 M
00 4 3 2 6 1035 951 - 23 4 3 0 521 340 32
01 18 6 3 3 628 661 57 23 6 0 1 391 296 33
02 7 6 2 15 1108 574 36 23 1 1 5 825 216 49
03 20 7 2 1 408 443 105 27 2 1 0 1292 1474 318
04 13 5 5 7 812 558 39 19 2 2 7 315 335 37
05 23 1 4 2 1460 561 110 8 5 5 12 798 518 38
06 9 5 6 10 318 794 216 18 7 4 1 1527 452 47
07 8 12 5 5 364 477 468 18 6 4 2 363 364 108
08 8 12 5 5 1095 588 31 22 3 4 1 192 875 25
09 18 4 4 3 2738 1916 244 24 6 0 O 314 784 68
10 9 6 5 10 384 353 79 15 10 3 2 1036 1794 60
11 21 7 1 1 235 353 21 24 5 1 0 563 262 17
12 2 7 8 13 397 490 219 25 3 2 0 562 220 80
13 2 5 3 20 850 1089 52 4 5 1 20 170 711 112
15 20 6 2 2 666 303 27 23 1 l 5 148 1462 68
16 7 5 8 10 190 383 53 15 6 4 5 370 324 55
17 9 11 6 4 797 200 41 7 4 2 17 97 1206 302
18 19 7 3 1 400 1277 36 20 2 2 6 1001 217 31
20 subject is the author 28 0 l 1 128 435 34
Avg 12.1 6.4 4.1 6.6 18.8 4.3 2.2 4.7

% 41 22 14 23 63 14 7 16

 

 

 

 

 

77% correct, 23% wrong

 

 

84% correct, 16% wrong

Table (3.1: Task T2: button selection accuracy statistics

222

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Session 1 Session 2

Avg. time for Trial Tot. Avg. time for Trial Tot.
Subj selection (sec) time (see) time selection (see) time (sec) time
ID 1 2 3 not T0 T1 T2 (sec) 1 2 3 not T0 T1 T2 (sec)
00 6.6 12.1 14.9 14.3 76 68 33 178 4.4 8.0 4.4 - 52 54 39 145
01 6.3 8.0 6.6 5.2 71 79 46 197 3.4 6.5 - 5.9 45 36 42 124
02 8.5 14.4 11.1 11.9 111 142 92 346 4.5 11.0 4.1 1.5 58 33 33 125
03 4.6 9.3 14.3 6.3 63 63 64 190 6.4 5.6 5.7 - 45 56 86 189
04 7.5 11.1 11.1 7.9 85 73 104 264 5.8 7.5 8.9 2.6 41 60 59 161
05 7.1 10.7 11.7 8.2 102 46 87 236 3.7 5.5 10.8 10.0 76 110 44 231
06 4.1 7.3 13.7 8.6 40 87 114 242 5.8 11.9 16.2 8.0 116 79 64 260
07 5.0 7.3 7.8 5.7 58 72 65 195 4.6 6.8 9.1 9.9 59 42 77 180
08 4.1 4.9 8.9 23.7 118 87 49 254 4.7 14.8 11.3 1.5 40 56 96 194
09 13.6 17.8 42.6 6.4 196 169 139 505 4.1 4.2 - - 32 35 55 123
10 3.3 5.7 5.3 9.1 53 43 83 181 5.7 7.4 41.8 30.3 80 174 91 346
11 2.9 5.9 5.7 5.1 31 35 46 112 3.6 8.1 7.7 — 53 47 34 135
12 3.5 5.7 12.5 6.0 80 88 56 225 3.8 6.2 4.9 - 51 34 38 124
13 30.4 13.2 31.7 11.8 176 224 55 457 6.1 7.0 39.5 2.3 28 95 20 145
15 4.4 5.4 6311.4 65 44 45 155 7.3 4.6 2.2 3.4 32 98 60 190
16 3.2 4.1 6.1 7.2 33 43 86 163 4.4 3.8 6.2 3.0 39 58 30 128
17 4.2 5.3 5.7 6.3 79 35 40 155 5.1 3.1 5.5 5.0 20 20103 144
18 7.4 7.3 10.6 3.3 46 83 97 227 5.4 3.1 4.4 2.1 53 47 34 135
20 subject is the author 3.2 0.0 3.2 2.1 22 37 34 94
Avg 7.0 8.6 12.6 8.8 I 82 82 72 I 238 4.9 6.9 10.1 4.7 51 59 58 164

Table C.2: Task T2: button selection timing statistics

223

 

Sel.tirnesvs astanoebetweenbimbyf-‘me‘lawforMdeiuJOOO

 

Head-E e—MoutthalO —-—
Heed-Eye— cum. mats1<2 +
MouseJn-leO—Z 77

1S .

 

 

 

 

-1 ~05 0
">le (DIS + ”1 "'

 

  
  

\/‘

0.375 0.5 0.625 0.75 0.875 1
preeneoordlnele

 

0.125 0.25

 

 

 

 

 

Xeereeneoordlnete

 

 

 

 

 

‘)/ M .. ”A .. .
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
X screeneoordhete

 

 

Seltirtieeu“

nu uwuwmmm

 

 

o-awmtulowcnrn

Head-E e-Moum.tm10 —~—
Head-Eye-Kmmmnate 1-2 —-~—
Meme. trial: 02 -

y/
1’17 .
.

.
n _ .
..g.-.‘ - wkiﬁfOI7-r-7Q77

 

 

.1 0.5
warms. 1m "

2 2.5 3 3.5

chit-innum-

 

Y screen MUN"

 

0

  

*7

 

 

cg» _\_ I

 

.r.

 

., g i
O 0.125 0.25 0.375 0.5 0.625

"- L » 1_
0.75 0.875 1
X W coordhlte

 

 

0.375

Y screen coordinate

0.25

0.125

 

0

    

/

 

 

E 1 ~
0 0.125 0.25 0.375 0.5 0.025 0.75 0.375 1
chreenooordheie

 

Yween ooordhete

 

 

 

 

0.125 0.25 0.375 0.5 0.625 0.75 0.875 1

X screen ooordlnete

Figure C.1: Task T2, subject 00, Fitts’ law time-distance plots and cursor paths for
each button

224

Time in seoonrt

V 5019“ ooordi rate

V screen ooordhete

Yscreen eoordhale

Set timeevs disumebetweenbutmbynns'lewforwgndJubLzom

 

Head-E 00171. tnelO —~—
Head-Eye .tnals 1-2 7+—
‘7 else. trials 02 -

 

 

 

o—Awuhmmuuw

 

-1

 

 

   

«.\ _—

 

 

1 1
1 i
A F .

o l 1 7 7"
0 0.125 0.25 0.375 0.5 0.625 0.75 0.075

 

 

 

 

 

 

 

 

 

 

 

 

0
O 0.125 025 0.375 0.5 0.625 0.75 0.375 1
Xsereenooor to

button

Y screen coordinate

T‘lrrieinseoonds

Y screen coordhete

Y screen main“

225

Sel.nnreen.detarioebetwembmtorisbyFnﬁhwforMmd.m_3001

Head-E e—Moum, rm 0
HeadvEye-Xdoum. wide 12
W Morse, trials 02

 

 

 

o-‘Nuburaiﬂmro

 

 

-l

1.5 2

3.5

2.5 3
Q h eke

 

', , ' n-N,_3001. mu 0

 

l |

  

885588181538

     
 
   

.9
vi
1
1
1
.ué’ii-‘gzyegl "_‘ _ .

/

0.375 77

0.25

 

0.125

l‘\

 

 

 

3. 7' "J A --
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xsereenooordlnete

 

 

 

 

0.825 0. 75
coordinate

 

 

 

 

 

0
0 0.125 0.25 0.375 0.5 0.625 075 0.075 1
ear I.

Figure C.2: Task T2, subject 01, Fitts’ law time-distance plots and cursor paths for
each

V screen coordhate

Y screen coordriate

Yweenooordhete

  
   

 

Set tirruvs Gswioebetweeriumombyﬁtts’hwforaxﬂwid.subl_2002

 

 

 

 

SettmesvsdistancebetwembmnmebyFme'hwfongnd.uﬂ_3002

 

 

 

 

 

 

 

 

1

 

 

0 ‘7 ‘ i ’ \ 91/
0 0.125 025 0.375 0.5 0.625 0.75
X screen coordinne

 

 

 

 

 

 

 

 

Y screen coordinate

Yecreencoordinate

Y screen coordinate

 

 

 

 

19 Heed-Wm. trIaIO 77-— 19 HeadE ammo ——
18 Heed~Ey cum. trials 1-2 7-7 ‘3 Heed—E .viel: 1-2 —+—
17 Mousetnals 0-2 - ,7 Home tn ~~~~~~
16 - , ' 16
15 15
14 14
re . 13
12 ‘ 12
11 - ' 11 _
. 1g c 1g
E a g a .
>- 7 . F 7
6 6
5 5
4 4
3 _c ’" 3
2 taxi” . 1. a _ ' 2 . u
1 " _ . . s . . - .g ,, ' . {.3 «are. . . n 4 1 t 7”,.-.«7r' . ..- _ .._.,..,..-_. ~v-‘f+< 1
a I A c ‘5" '
~1 -O.5 0 0.5 1 1.5 2 25 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
iog_2(DfS-¢II21'“ thimnnebe mumsunw“ chum-In

 

 

._x"

 

 

 

I
~ 7 1 1

0 0.125 0.25 0.375 0.5 0.025 0.75 0.875

Xscreencoordhete

 

 

0.375

0.25

  

0.125

 

 

 

0

 

‘2‘:
0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1
Xsaeencoordhete

 

 

 

 

 

0
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xscreenooordinate

Figure 0.3: Task T2, subject 02, Fitts’ law time-distance plots and cursor paths for
each button

226

Sol lime-vs dsnneabuwoenbmmbyﬁm'hwtormmd.m2003

 

‘9 HemE othnaJO —-—
.mals1-2 ,...
,mahO-Z —

18 Head-Eye oath
Mouse

Tlmatnloomds
6:5;

O—‘NUAU‘OVQW

 

 

Time In swarm

 

 

6
u-
.9
o-
m
r0
N
in

 

 

 

Yeoman coordinate

 

Vinson coordinate

 

 

Xmeoordmh

 

 

 

V saaan coordinate
.12“— ’

-r

“I
‘ - J
$3.35

I

 

\

    
 

Vlaoonooordlnm

 

 

5 \ r
0.375 0.5 0.625 0.75 0.875
X screen ooordhah

 

 

 

 

 

&
0 0125 025 0.375 0.5 0.625 075
proonooordlnnla

Figure C.4: Task T2, subject 03, Fitts’ law time-distance plots and cursor paths for

each button

 

227

501 have dsmbemembumbyﬁm' hw'oraxlw. 00013005

 

Head-E mouth, halo——
1: HmeKAomhI with 1-2 No—

 

 

 

o-‘Muuu-owmro

 

-1 -0.5 0 0.5 1 1.5 2 2.5 3 15
tom (as + m "-

u‘z

 

 

 

 

 

1‘ I \
0 0.125 0.25 0.375 0.5 0.625 0.15 0.875
Xweonooordhnh

lufﬁm. Mil 1

 

38
|

 

 

 

 

‘ 1 3; 2 \
0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1
X sum enormo-

 

 

 

 

 

o 5 N
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
X screen ooonihaw

Solﬂmesvs awmmwauwwmwAmjoor

Sd.mnasvs.d1stancnbemoer1mubyFm'hwbrMWJmL3004

 

 

 

 

 

 

 

 

 

2
$3 Head-E woum. IO + ,3 HeadvE M01191, 1mm —-—
18 Head-Eye—XAoum,m 1-2 --~—— 18 Head—5y 01.11131: 1-2 ~0—-
17 : Mouse.ma1so-2 17 Mommy) , -~--
16 16
15 15
14 14
13 13
g 12 . ., § 12 '
g 11 g 11
10 - 10
E 3 E 3 w
1: 7 .— 7 . /
6 6 . . ..5’/ .
5 5 /'/I ' .1
4 4 ' .A_-—’ -. ' _
3 3 - ‘,//’ .
2 2 . I I .
1 1 y - .1 7 - >1 '1 < .r‘sf-w ‘———-. J" « 1
O 0 I:
-1 ~05 0 0.5 1 1.5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
Iog_2 (DIS + 12). " qu_2 (DIS . m1 = 1. team .12.

 

 

 

V 5am coordinate

 

 

 

 

 

Yscreen coordm

 

 

 

o ‘ ’1'
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xscromcoomle

 

 

 

Vscreencoordlnsns

  

 

 

 

0 ~ 1 ’
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
X screen coordhslo

V screen commute

Y screen coord'nltc

V screen coordhala

 

.. 4:-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0125 0.25 0.375 0.5 0.625 0.75 0.675 1
coordina

0

Figure C.5: Task T2, subject 04, Fitts’ law time-distance plots and cursor paths for

each button

228

Sdumesvs dismbemembmmbyﬁm‘hMoraxadem‘Lzoos Solunusvsdsmbammmbyﬁm‘hwforﬂxﬂam.sm_mos

 

 

$3 . Head-E 0410001,:er + 15 Head-E o—Momn. 1mm —-—
13 Heed—Eye- oum. Inals 1-2 -« o— 18 W cum. vials 1-2 ~o~
,7 nommaisoz - 17 Mouse. 1:111:02 -- -
16 15 '
1s _ 1s
14 14

3

Ttrnohseconds
8"§“
Tlmainuconds
836‘

.

     
  

    

 

 

 

 

 

 

 

 

 

 

 

 

      

 

 

 

 

 

 

 

 

9 ____._’_’_‘____1 9 '
8 t - / a -
7 - r : _'/»/‘" 7
6 - 6
5 . . 5
4 f- 4
3 “ ,,./ .. 3 _ .-o . -
2 ”/4" -* 2 . ,a-‘H- '
a , .. . ~ g, - w. -... WW. ,1
-1 —0.5 0 0 5 1 1,5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
loq_2(DIS+1/2)."‘ ‘ shh-mm‘n Iog__2(DlS+1f21" ‘ Git-manga-
“ ", , um;_2005.ma10
1
0.875
075
a “"‘ s
I
g 0.525 g
\
g 05 a g
g 0.375 “ 7 g - , . ‘ ,
> b ‘ . . . . ; ’ > : . a. ’ .
025 _' . s, " ‘ ‘ . ,. ~ . ' ~.
0125 - . , . ...... ‘ , J
. 2 .. , *. Is 1‘ ' . . , l
.- . »__/,__,.,..,i_ ‘ A 1 . x -
o - 1, 1 - «.1 o 2 1 x
0 0.125 025 0.375 0.5 0.825 0.75 0.875 1 0.5 0.625
Xscreencoordhele Xsaeen
"' ‘ E, , ' CIhLZOOS. VIII 1

 

 

     
 

 

  

8838828828

Vscraan coordinate

. 1
1
I
. 1
Vscreenooordhsu

 

 

 

 

 

 

 

 

 

Xsaemcoordm Xscremcoorumm

 

 

 

0.75 if:

V screen commute

Y screen coordheb
0
0|

   

 

 

 

 

 

 

0.375
025
0,125
0 " _ - A o ’
0 0.125 025 0.375 0.5 0.625 0.75 0.875 1 0 0125 025 0.375 0.5 0.625 0.75 0.075 1
Xscreencoordinete Xscreenooomh-te

Figure C.6: Task T2, subject 05, Fitts’ law time-distance plots and cursor paths for
each button

229

Y 861000 mvinb

Y screen ooordhnla

Vlaoan coordinate

 

 

 

Sol limesv: wmmmwmuwmmeuoLm

Sal arr-av: ammmwmmwwmwauqdaooe

 

    

 

 

 

 

 

 

 

lou_2 (DIS 91R), "'

 

 

 

 

 

semen coordinate

 

 

 

 

 

 

0 A 11.1 ’
0 0.125 0.25 0.375 0.5
xmm‘nﬂ.

 

 

 

 

V ween 00am”

33 Heidi e—Moum.tnal0 —-— $3 Headi ouerIaIO —~—
13 Head-Eye- oum.mals1-2 ,_._ 13 ﬁm-Eyo- m,vinl31-2 —~o—
17 Manama“ 02 -~ 17 MouseJﬂaIsO—Z ....
16 16 '
15 .
14
ﬁ 13 _
12 '
S " g
E 10 c
B " ,1 - 9
£3 g:
s / f 6
5 5
4 . 4
3 . 3
2 2
1 1
o " ‘ " 1 o
-1 -O 5 0 0,5 1 1 5 2 2.5 3 3.5 -

 

 

o “All;
0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1
Xmmm

 

 

 

«)3?
0.625 0.75 0.875
Xsaoenooovdlmh

 

 

 

INN m. 111d 1

 

 

 

 

 

 

 

 

 

 

Figure C.7: Task T2, subject 06, Fitts’ law time-distance plots and cursor paths for
each button

230

Sol unuvs dammmmmwm'mmmgnd.sm_2m7 Sol nmusvs mmmmmmwm'hwmuammm7

 

 

    

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

$3 Head-E e—Moum. mic _._ $3 Head-E mamrrialo —-—
” Hua-Ew’doum, ma): 1-2 — -~ 18 Hew~E cum, trials 1-2 ~0-
‘7 Mouse. 02 - 17 MousoJmllO-Z- .-._
16 . 16
15 15
u _ 14
g 13 1
12 12
§ 11 . I .
g 10 - = 10 .
_ 9 - 9
E 5 2 a
>- 7 5: 7 .
6 6
5 5
4 d
3 3
2 2
1 1
O 0 r
-1 -0 5 0 0 5 1 1 5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
MIND/5.1mm <hnmmn Iog_2(cysnm"‘ autumn"
“ “ F, , mhL2007. trial 0 “' “ ', , IIIhLIIXW. MI] 0
s
g
9
>
0 7 ‘ -‘u 7 17 . k 0 A
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 O 0.125 0.25 0.375 0.5 0.825 0.75 0.875 1
Xsaoen 0007mm Xwoonooordhlln
u .. a, _ r an 2007, m1 ”
{a W 7 1
b1 _—
he — o.a75_
b3 ‘ ...
g __ 075
i '89 ' ’
be 0825
§
i 5 °5
Q \ g 0.375
> ‘\\ >-
,7 H , Ir" 025
; 1
~-~--J 0.125
‘- :.. “
0 0
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 0

 

X screen mourn

 

 

 

 

o 375

 

Y screen mum
9
0'

Y screen coordlnnla

   

 

 

 

 

0
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0375 05 0.625 0.75 0.875 1

Demon coonﬂnma X screen mammals

 

Figure C.8: Task T2, subject 07, Fitts’ law time-distance plots and cursor paths for
each button

231

 

sumvsuswmmoenmbymnwhrmw.mzooe “.mwmmmwm'hwmmmmm

 

 

 

 

 

 

 

 

 

 

 

1g Head-E cumtrialo —-— $3 Head-E 01m.tr1aI0 -°—
18 Head—Eya- mammals 1-2 »+— 13 W .malH-Z —--
‘7 Museums 0-2 r 17 M0110. muse-2 »
16 16
15 15
14 _ 14
13 13 -
12 ' 12 '
11 11
10 10
5 9 5 9
E a E 8
1— 7 1- 7
G 6
5 5
A 4
3 _ 3
2 2
1 .. .
o - r . - . 1
-1 -05 0 0.5 1 1.5 2 2.5 3 3.5 -1 ~05 0 0.5 1 1.5 2 2.5 3 3.5
mzmsumm ' Qkhmlmnb- magnum” thunmnnb-

 

 

 

 

 

Ysaesn coach-to

Y screen coordinate
0
0|

      
       

‘1 | I

 

 

 

 

1

0 .i - . - 11. 0 g _ ,;
0 0,125 0.25 0.375 0.5 0.625 0.75 0.87 1 0 0.125 025 0.375 0.5 0.625 0.75 0.075 1
X screen ooovdmlo X

 

 

 

L‘ ‘E, , mhmew1 ‘4 ‘5 I“ ”8.111!”

 

 

m.

V screen ooordhcla
V sum ooominah

  

 

 

 

 

o ‘ ,
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
b

 

 

 

 

V screen coordinate

   

 

 

 

 

- r~~«——~«___

*~ Lg

o “ L o ' ‘ r
0 0.125 0.25 0.375 0.5 0.625 0.75 0.375 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.075 1
X screen coord lo X mun mouth-tn

 

 

Figure 09: Task T2, subject 08, Fitts’ law time-distance plots and cursor paths for
each button

232

Sunmasvsnsuncabmmmbyﬁm'hwlorwdeuhLm “.mvsdsumbmmumombym'lawfolmw.uu_m

 

 

   

 

 

 

 

 

 

 

 

 

 

‘9 Head-E oummmo .,_ $3 Head—EneMo‘m. 1:1an —-—
15 Heal-Eye oum.mals1-2 , -.. 1a Head-Eye- ouh mam-2 ~4—
‘7 Houseman 0-2 —- ‘7 MouseJrImO-Z *~
15 16
15 . 15
14 14
13 ' ' ' 13
12 (4 12
11 . .uﬁx—M‘ 1
1o ,5- » :4"- 1o
5 9 __,-—-/”’. 5 9
E a 7 A - ‘ . . g a .
1: 7 r "‘ . F 7
s s j
5 : . ' " - 5 ”if“
4 . 4
g . - . 3 . .
. - . - . . . 2
1 D . . ' ' . c. . T". - ”hr-33'. ' 1 1 . , 1.1.1; ...,..ﬁ’
0 ' 0
.1 —O.5 O 0.5 1 1.5 2 2.5 3 3.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5
109_2 (DIS. 1m " summon um Iog_2(D/s . m) " - swam-1n
“ ‘ 5, ._ r mhi 200911810

 

 

 

 

 

0.5 0.
X screen coordinat-

   

 

 

 

 

 

 

, " 1 Ft r,/
0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xscroonooovdlnnta Xscmcnooovdhlb

 

 

 

 

 

   
   

,\
_,_._>

1
L

 

 

 

 

 

15 .1 v
0 A r
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 O 0.125 0.25 0.375 0.5 0.625 0.75 0.375 1
Xmooorumn chtom ooordhale

Figure C.10: Task T2, subject 09, Fitts’ law time-distance plots and cursor paths for
each button

233

Sel. limesvs mmmmwM' lawmaxew.:w1.201o

Sol nmw.dismmmsbyﬁm'hwlotmwd.m_mlo

 

19 Head-E Mormmlo —-—
Oum‘mLZ —-0—
memo-2 ~ ,

13 HeadAE

Time In seconds

    

 

 

A
.V_).»
. 3"" "'- .I.

 

19 Head—E um.tr'1alo —.——
w "mwﬁms t:

 

 

 

 

-1
annmnnuin

-05 0
mun/Sum ..

V onhl 2010, trial 0

 

w ,
01—
02

 

._\
1':f"1
1 \ :
1 ‘11

 

 

 

0
0 0125 025 0.375 0.5 0.625 0.75 0.875 1
X lo

 

nut-3010. 111.1 1

 

Vsaoon coordinate

 

 

 

 

 

Yweenooovdlnaia

 

 

’ 4.57- ' ' 1'

0.125 0.25 0.375 0.5 0,625 0.75 0.875 1
Xaamooordlmb

 

0.5 1 1 5 2 2.5 3 3 5

      

 

1 1.5 2 2.5 3 3.5
4‘ ' e 1. human (11-

IHN N10. Md 0

 

 

 

 

 

 

 

 

 

 

 

 

 

o ' .
0 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1
Xaaoenooordlmln

Figure C.11: Task T2, subject 10, F itts’ law time-distance plots and cursor paths for
each button

234

Sol 111-nous ammmmmbynm'hwmmemgon demvsdsmmmmmbyFM'hwvaWJmJOH

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

     
 

 

 

 

 

 

 

 

 

 

 

 

 

$311115 Mouth mam —«— 3 H0365 0th halo—.—
1a HeadEEye’XAoumm 1-a1s12 + ,5 News “011111151114 —-—
'7 Mouse. mats 0-2 > 17
1s 1s
15 15
14 14
13 13
12 12
11 11
. 1: . . 1:
E a _g a .
F 7 - 1— 7
s 6
5 s
A 4
:1 a
2 2
1 1 . 7
o o "
‘1 41.5 o 0.5 1 1.5 2 2,5 3 3.5 -1 -o.5 o 0.5 1 1s 2 as :1 3.5
1093(054101 " ' ‘ ' <khmtm (11: m2 (013.10) " ﬂint-mun
1' 1", , nut-172011.111'uo U R, L mu 3011.11.10
1 r r Y 1 W
, no 7 7
0.575 01175 b2 — —
a , 2
0,75 0,75 .5 _
g 116
0.625 0525 '\ b7 -
g \\ u
- 19
05 05 \ .1
1 i - ‘
‘11’1
g 0375 § 0.375 1‘ 11'
> > E 1
025 0.25 '
0125 0.125 r,
o
o o
coordinm
1
0.075
0.75
s s
g E 0125
§ E 05
9 § 0375
>- >
0.25
0.125

 

 

 

 

 

 

 

 

0
0 0.125 0.25 0.375 05 0625 075 0875 1
samooordha Xweenooovdhnh

 

 

 

‘ ', .. IIIhLKN 1. Vial 2

 

30.“
b1—

  

Yaaomcoadm

Vsaoancoovdkma
.0
O
.1 3

 

 

 

 

 

 

 

0.375
025
0.125
0 0
0 0.125 025 0.375 0.5 0.625 0.75 0.675 1 0 0,125 025 0.375 0.5 0.625 075 0.875 1
in: Xscroencoordlnahe

Figure C.12: Task T2, subject 11, Fitts’ law time-distance plots and cursor paths for
each button

235

Sal nnnsvsnismbomommbyﬁns'lnwwmeubLzom Summon. ammmubyFm'mmmw.sub1_ao12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

    

20 2
,5 Head-E eMoummraIO —-— 1g Head-E 11411011111.qu —-—
1s Hand-Eye- 011111.01Ns1-2 -~— 13 News» warm-151.2 +
'7 - MouseJnalsO- ‘ ‘7 Mousenn-hO-z » ~~~
16 * 16
15 15
14 14
13 . 13
12 12
1 1
10 10
s 9 . s 9 .
.5 5 . g a .
1— 7 ' . . . F 7
6 ' 6
5 5
4 4
3 3
2 2
1 1
0 0 .
-1 -0.5 0 0.5 1 1 5 2 2.5 3 3.5 -1 —0.5 0 0.5 1 1.5 2 2.5 3 3.5
man/5.1m" ' ﬁuhunmsan mun/5.1121" ' ' thr.-mun
“ Ar, ,' cm 2012,1na10 u 11" ' unscramo
1 .
0.675
0.75
3 s
g 0.525 g
1 °5 g
g 0375 )v g
> >
025
,/ "
0.125 f
y

1

 

 

 

 

 

 

 

 

0 ' 1 1 'r f 1 .
0 0.125 025 0.375 0.5 0.625 0.75 0.675 1
mm

 

” ‘ = ' nut-1.2012. 111-11 ‘-

 

 

0.375

Yscrmcootdhma

Vsaeen mum
0
0|

0.25

0.125

 

 

 

 

 

 

 

0
0 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1
Xmooomb

 

“ “ ', r In” 3012. I182

 

 

   

 

 

 

 

 

 

9
g E
9 9
). >
1' . 1 ' “:\,_ ‘ \
o 1X- 0
0 0.125 0.25 0,375 0.5 0.625 0.75 0.575 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xmooordhnb Xwoon

Figure C.13: Task T2, subject 12, Fitts’ law time-distance plots and cursor paths for
each button

236

6d tun-Iv: dsmebunmbyFm'hwlmed.uq_2013

 

    

$2 Hoao—Exﬂe—Moomﬂlalo —~—- 3: Headi Mouth. malo ..—
15 Head-Eye- oummlals 1;,2 -+~ 18 Head-Eye- (1101,0131: 12 —0—
17 ' Mouse. mats 0-2 - 17 Mouse. mus 0-2 >-
16 16
15 _ 15
1‘ 14
g 13 13
12 12 - _
§ 11 11 (,1
g 1g g 1g . /
E 6 g B ’-///.
1— 7 r. 7 . /,/'
6 s ,, t/mr .
5 5 ‘ ,s/ u
4 4 ,
3 3 /=" ‘ _
2 ' 2 ' - . n .
1 r.’ ,‘.'v‘"-‘.-!'-"9‘~' J"- 1 £3"-"“"- "' 1'
O 0

 

 

Sol amasva ﬁmmmbyFm'hmeWJuwaﬂ

 

     

    

 

 

 

 

-1 ~05 0 05
WHO/$.12).Damsumebotwomsohalonpanu.submonwe

mlfg72013. trial 0

      
 

 

 

t‘ , _ 00 7
1’1» ’ . b‘l —
0.875 5 '\ 1 -
075 “1‘ 1
s
5 0.625 r x
U
.
3 1,5 ,1 _
g 1 . .
g 0375 . .
> - ' s .
025 1' : "‘“‘
1" ' .
0125 » _ *,
.{rv‘
,1,

 

 

 

0 ~ .4 ;_.{r‘
O 0.125 0.25 0 375 0.5 0 625 0.75 0,575 1
X coo inalo

mar-13013. 01111 1

 

 

Y moon coordinalo

 

 

0 0125 025 0.375 0.5 0.625 0.75 0.875 1
womeoordlnala

, mhL2013. trial 2

 

Y screen 0001mm

 

 

 

0
0 0.125 0.25 0.375
X

1 15 2 2.5 3 35

 

   

 

 

1

0 5 1 1 5 2 2.5 3 3,5
' ' Q 1- mm an

II'N 3013.011] 0

 

 

 

 

 

O 1' ‘— 1' A A ’ ‘ 1
0 0 125 025 0.375 0 5 0 625 0.75 0 875 1
00010“

 

 

514' .
2 \J

 

 

 

 

o \
0 0125 025 0.375 0.5 0 62 0.75
X ”010.131.

 

,
0 875 1

In" 3013. 01-12

 

 

 

 

0 >
0 0125 0.25 0.375 0.5 0.625 0.75 0.875 1
promooovolnaln

Figure 014: Task T2, subject 13, Fitts’ law time-distance plots and cursor paths for

each button

237

 

SamudsmmmlsbyFm'hmeWJmes

 

 

 

 

~ IWVW wu. WINS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

20 2
19 HeaGE Mouth, 1:1an —-— .2 Head-EnerMotm. 111310 —-—-
1B Head—Eye- . mats 1-2 ~ 0— 18 Head—Eye- oum. mm: 1-2 +
17 “ammo-2 7+ 17 mv1-1302 5»
16 16
15 15 u
1‘ 14
g 13 13
12 _ ' E12
11 11 ’-
10 1o ///
c 5 ,7-
— 9 . 9 _ ,M
g a g e wf/ﬁ .
1: 7 . 1- 7 12’” '
6 ‘ 6 x” - ' . .
5 ' . 7 — A5" 5 .. .. .4“ “ - - - '
‘ 1'", — 3‘ 3 f - c 4 7,41” . a-
3 _. . r , 1. .-' 11» e. . 3 ..x’” .
2 , ' . 2 ,,/’ -
1 _ /""' - . oi Mm 0‘- ‘" '9 ’ 1 W/‘ ‘ , ', a -- v 1. ’35-'31.“ :7 -~"I
o i o
-1 -0 5 0 0 5 1 1.5 2 2.5 3 3.5 -1 ~05 0 0.5 1 1 5 2 2.5 3 3.5
lemma/5.112)." ﬁkhmlnn 117. 1091.2 (95.1») "' ‘ :hhuumm-
“ “ F, ,. III",_2015,V1&|0 ” “ " , III“ 3015.111810
1 _ 1 1 w 1 7 v
M .' ‘ 3?. \ IV ("V _ '
1:2 — ~ 0375 ‘52 ~ Y
D3. /
g __L. o 75 1’ > '
1:75. ~ g /
b8 .. E o 625
’ ' 0.5 -5 ,
. § ""~—-.:,‘\_._‘
1 ‘ g 0.375 {C
\1 ‘ > 1‘
f . 1 025 g
1‘ V
‘ 1.. 1
0.125 . r" , 0.125 r‘f‘q-ﬁ
. V= 1 .-d¢1 1 1 1 .1
o A 1 ' - . n O A .1 l
O 0.125 0. 0 1

25 0.375 0.5 0.625 0.75 0.875 1
womenomhnb

, ' mm 2015, m1

 

 

I111?

zzzazzzzz

 

 

 

1

 

 

 

 

 

 

O -
O 0.125 0.25 0.375 0.5 0.625 0 75 0.875 1
X semen 000111111013

Figure C.15: Task T2, subject 15, Fitts’ law time-distance plots and cursor paths for

each button

V5610” 00016.“

Vaaaan ooovdhan

238

0.125 0.25 0.375 05 0.625 0.75 01575
Xianen In

3"" 3315.111.”

 

 

 

130.,
D1—
:2

 

 

0
O

0.125 0.25 0.375 015 0.625 0475 0.875 1
IO

 

', r , l-"N_w15. Irlll 2

 

 

0.375

0.25

0.125

 

‘1

   

18123882815281
. . . . 1‘
i

  
 

 

 

0
0

01125 025 0.375 0.5 0.625 0.75 0.875 1
Xmencoonﬁnlle

 

Sol Mauve mmmumwms'hwwmwmmums 501.va mmmmwr‘m'mwuw.m_301e

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

       

 

 

 

 

 

 

 

 

 

 

 

 

$3 Head-E euouth. «1.110 _._ $3 Head-E 110001.111an ..—
1a Head-Eye- cum. lr1als1-2 ..._ ,3 Hoad-Eyo— 01.015012 ..2
17 Mouse mats 0—2 +77 17 Monomer} ..-
18 16
15 15
14 . ' 14
13 13
12 12
1 11 n
5 1g ; ‘g .
E 5 - E 0
E 7 _ _ 1: 7 . . .
5 ' - ”,— «”’1 6
5 - K/y—A’ 5
‘ ' - = ,/-.”' t
3 —/ ’ 3
2 - ' ""° 1 ' : . 2
1 / ..- f? ,:.I‘-’| 5- 1
0 .- V . - n' o
-1 0 0‘5 1 15 2 25 3 3.5 -1 -05 0 05 1 15 2 2.5 3 3.5
1093 (as .1121" Q 1- mmm m- Iog_2 (D's 91/2)_"' ‘ Q - mum on.
F "" " ‘IIPL2016.MIIO “‘ ‘
1 b0 7 ﬂ 1
bl —
0875 D2 —— 0875
133
0 75 '- E ;_ 0.75
s , s
I «.1.‘ b7 . .
s 0625 be ,_ -. g 0.525
1 5
§ 05 5 05
g 0.375 g 0375
> >-
025 , 0.25 . , . _ y,
0125 jLE/J :— 0.125 F“. _ l \ ‘ ‘ : ._ /.
1 v 7 \
0 4L 1 1 .L 0 ‘ 1 - ‘1‘»4
0 0.125 025 0.375 0.5 0.625 075 0.875 1 O 0.125 0.25 0.375 05 0.625 0.75 0.875 1
Xscrun Io '-
1
0.075
075
S
g 0.825 E
1 °5 1
g 0375 §
> >
025
0 125
o '.,.’ '1 \- ILA 1

 

 

 

 

 

 

 

._1 \F 1
0 0.125 0.25 0,375 0.5 0.825 0.75 0.075 1
000 h

 

', , ' mm 2016. m1 2

 

 

 

Yuan ooo'dhall
Vacmn ooovdhmo

 

 

 

 

 

0

 

 

' K]
O 0.125 0.25 0.375 05 0.625 075 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1
Xwomcoormu meooomnu

Figure C.16: Task T2, subject 16, Fitts’ law timecdistance plots and cursor paths for
each button

239

Sol nmasvs ﬁwmmwﬁm'hwwmw.mm17

Sol maGmmobmomumnbyFm’hwfo'MmeJOV

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

  

 

 

 

 

 

 

 

 

 

2
13 Hmenwm.m10 —-— $3 Hem-E o-Moum.1mm _._
18 Haaafyo- cummajs 1-2 ,‘W 18 Heed-Eyo- 01.01.1112 +
17 MouseJnalsO-2 -» ‘7 MoounaIIO-Z - -
1e 16
15 15
14 14
1:1 13
g 12 g 12 '
11 11
c 10 g 10 ///‘
- 9 _ - g /’
a g a . ’
g 7 1: 7 / .
6 a // .
5 5 I l. ,/" I
4 4 /,// . .
3 3 /,.— . :
2 2 /« ' . _
‘ 1. I . -' . ’.°.- -1.-~d _a. .‘t.
0 0 2
-1 05 o 0.5 1 1.5 2 2.5 3 :15
1093 (05.1.91 .4 ' shy-mun
u .E .. ...:.. 2017.va u .z Anti 301701.10
1 .. . .
no 7 -. 1 .. 0° 5
0075 :2 0075 . 'A :2
W . 1 , _
03 . ,’ , :2 ‘
075 g ...__' 075 - 1:5 ___
s g; s 3; ~
0625 be 0525 her
.____ w 2. _ ,
i 0.5 i 0.5
g 0.375 g 0375
>- >
0.25 . ._ .\
’ 1 \
0125 ————— . ‘~ 55:254.
; . _ __:b 1 . ; - 1
0 '1 o ' ' L 4“
1 0 0.125 0.25 0.375 0.5 0.525 0.75 0.075 1
Xsaoen nominal.
.. . .. . '. r 5 ...u,_3017.m11
1 1
0075 0.075
075 0.75
g 0525 g 0625
§ 0.5 § 0.5
9 0375 3 0:175
> V‘ >
' ,1 025
f1:
L1~J 0,125
0 l “I
0 0125 0.25 0.375 0.5 0.525 0.75 0.075 1

 

 

Ysaoancoamb

 

 

V ween machete

 

 

0 0.125 025 0.375 0.5 0625 075
Xwomeoovdham

Figure C.17: Task T2, subject 17, Fitts’ law time~distance plots and cursor paths for

each button

240

X mooordlnab

mm 3017, M2

 

 

 

 

 

0
0

0.125 0.25 0.375 0.5 0.625 0.75 0.375 1
Xmeoomh

$01 limes vs dlslanos between buttons by Fms' law lor 5x8 0110.sub1_201a

 

 

 

 

denuva mmmmbynm'hwmaxememama

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

  

 

 

 

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

$3 HeactE Mouth. 171310 $3 Head-E 110001.01an 4—
18 Head-Eye outh.ma151-2 ~o- 18 Hm£y9~ 001111110312 —0—
‘7 Mouse. 171315 0-2 - 17 Mouse,mnls0-2 < ,5.
16 16
15 . 15
14 14
13 13
g 12 ,/ E 12
11 1’ 11
a 10 ,/// 10
E C
_ 9 - 9
g a - g a .
1- 7 4 -a _ _ I 1: 7 -
6 /./’ - 5
5 9// ' 5
4 -../’ . - ' . 4 -
3 - . . ./ ' . - 3
2 . .' 2 .
‘ .- ,/ “:- . r,‘- . h 11'-'." 5 1 . _,—’_
0 ' ' 0 ’ '
-1 415 05 1 15 2 25 3 35 -1 ~05 0.5 1 1.5 2 2.5 3 3.5
Iog_2(Dl$+1Q)," ﬁishunmmn log_2(DISo1f21" ﬁshmmnurn
u a 5 21111130150120 “ 4", , cuhl 3015.01.10
1 _ 1
0.575 1. 01-175
075 f 5 0.75
1 I
5 ' i‘ ..
5 0.525 ‘1 2+“ .- 0.525
E 11' 1f? 1' ' .5”
0 5 " ” 0.5
§ 1110 1 ’ " i
51 0375 _.,.+,1f-,:__ .. 3 0375
>_ ‘1. 1 1 \ >
0.25 ‘ 5‘ 1‘ ~ 025
u [f
7
0125 I——“—; ---r—E 0.125
0 i 'I 1 1 L ‘ 0
0 0 125 0.25 0.375 0.5 0.625 0 75 0.875 1 0
Xmencoommlo
“ “ E ,. I-IN_2O15. 111111 1 “ ‘ ', ‘ IIIhLﬂM. 1nd 1
1 7
1 00 , _
0875 0.675 b2 ~—*
3 ...-
075 05 _. 0.75 05 __
g 1:16 rrrrr 3 1:16 ~
.5 0.525 E 0.525 ”7
1: 5 DB "
g E ~
§ 0.5 § 05
g 0,375 g 0375 ,
> > g ‘
0.25 0.25 .
1
0 125 0 125 L
0 ‘ “'5 o '
0 0.125 0.25 0 375 0.5 0625 075 0.875 1 0 0.125 0.25 0375 0 5 0625 0 75 0.375 1
50701371000111qu Xsaoeneoordtnaie
“ ', ,. V "IN 3010. Mil 2
w _ .‘
D1 —
b2
33 .
b4
g ___-
S 5 .
b7 -
z a 11 7
1 1
§ . . 5 a
g 0375 . g #551
>‘ 1 >' ‘. ~.
025 _
0 125

 

 

Figure C.18: Task T2, subject 18, Fitts’ law time-distance plots and cursor paths for

each button

 

screen ooordnate

241

 

 

 

 

0
0

0125 0.25 0.375 0.5 0.625 0.75 0.675
50001500310108“

501 times vs ﬁnance between humans by Fins law 107 8x8 and.su01_2020

 

19 Head-E 9540011111310 ——-—
‘5 Head-Eye— oum. mals 1-2 —0—
Mous -

Time In aacontk

 

 

 

 

o-nlvu5mmuuto

-0 5 0 5 l 1 5 2 2.5 3 3 5
Iog_2 (DIS +1/2)." Q s hutmn an

 

.4 . z «1121,2020. m1 0

I517 ,
51:11

 

Y screen coovdlnala

 

 

V screen oootdinala

 

 

 

 

o \ ' ~ , _
0 0 125 0 25 0 375 0.5 0 625 0 75 01875 I
can DOOR,“

 

 

V semen coordinate

 

 

 

 

0
0 0125 025 01375 05 0.625 0.75 0.875 1
501001100010qu

Figure C.19: Task T2, subject 20, Fitts’ law time-distance plots and cursor paths for
each button

242

Appendix D

Task T3 Complete Results

Table D.1 gives the statistics of the button selection accuracy and the average squared
error from the linear regression line of the data plotted by Fitts’ law. The ﬁrst ten
columns give the attempt. number when the correct selection was made. The general
trend from the ﬁrst to the second session was that the selections were made mainly
in the first two attempts and that the number of more-than-ten attempts is lower.
Also, in the second session, most subjects were able to complete most draggings in
one or two steps, unlike the ﬁrst session, where the number of steps varied greatly.
Table D2 shows the selection time statistics. The ﬁrst ten columns for each
session give the average total time needed to make a correct selection in the ﬁrst,
second,...,tenth attempt. The average time generally increases with the number of
attempts, which is expected, since the user needs to make more than one selection.
Also shown in the Table are total elapsed times for each trial, and for all the three
trials. As the trials progress, the elapsed times didn’t decrease. However, when com—
pared across the sessions, both the individual trial times and the total time decreased

243

signiﬁcantly. This means that the users were able to make faster selections in the
second session.

Figures D.1 through D.19 give the selection times vs. button distance by Fitts’
law and the. cursor paths for each two dragging locations, for all 18 subjects and the

author.

244

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Across subject avg. number of steps: min 35, max 131, avg 84

Selection Number of

correctness dragging steps REGLIN E1
Subj Sess 12 3 45678910 12345678910 EOE1—2 M
00 1 2514 0 200000 3 7 42200000 0 526139 -
2 5013 2120010 3 22 52100000 0 238 25514
01 1 4515 3 010221717103000000 0 480134 23
2 4393 2132330 2 23 61000000 0 353 249 23
02 1 116112021021 3140000101 69 - 53
2 3021212100119 9 421012001294 23148
03 1 3256 4 44224111 6 32211110 3 64 13100
2 5586 81053302 6 5104021030 5176166 43
04 1 3380 9 34034247 4 66312400 4 69 33 39
2 3836 2 2234041017 61220100143310918
05 1 25413 32161218 5 264000012101 100136
2 4444 4 2542221910 545022011138124 37
06 1 4136 4 255311251166220010 2 7810511
2 4834 4112401413 84120200 0 322 248 29
07 1 3454 5 76361041 4 375242201 13 2718
2 3798 8 32542624 4 91422012 5 85 9612
08 1 4752 4 434234 8 9 833401011218 11714
2 51201313001418 821000001251250 2
09 1 4633 5 4320121413 35413100 0 22111817
2 4930 2131214 717 62210000 2189131 16
10 1 3429 0 73512260 9 23671000 2 5 18 40
2 4812 5 0201013 20 82000000 0107 58 54
11 1 4882 6 4062231810 843001013 225 4211
2 56171210100117 92200000 0 358193 7
12 1 3737 7 5056134314 52131010 3 18 4018
2 5623 0100000 2 2910000000 0 495 494 6
13 1 643 5 35251235 0 00000110 4 23 -156
2 3810 41323211616 94100000 0 211 121140
15 1 5515 0 002000 2 24 40100010 0 619 248 19
2 58 8 1 000000 3 26 30000010 0 212 106391
16 1 3166 6 63174540 9 333112214 38 3418
2 30412 20124347 4 954331010 56 19 21
17 1 39416 66274235 9 46410110 4 70 84 37
2 3816 2 025101112143200000 0146117 -
18 1 5050 7 961122 9 9 65311201212613015
2 525311354121613 61224001114517511
20 2 5714 3 0010010 2143200000 0 476 283 32
Avg 1 3542 4 43231224 9 44311110 2165 86 43
2 4636 3 2222121016 622110001234175 51

1

2

 

 

Across subject avg. number of steps: min 31, max 139, avg 67

 

T - Average squared error of the regression line by Fitts’ law

Table D.1: Task T3: button selection accuracy statistics

245

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Avg. time for Trial Total

selection (sec) time (sec) time

Subj Sess 1 2 3 4 5 6 7 8 9 10 T0 T1 T2 (sec)
00 1 5.4 4.9 0.0 15.1 0.0 0.0 0.0 0.0 0.0 23.7 110 100 91 303
2 4.3 2.8 4.3 3.8 5.7 0.0 0.0 2.9 0.0 7.9 88 126 88 303

01 1 3.5 4.6 4.4 0.0 8.9 0.0 4.4 8.6 4.0 5.8 105 101 111 318
2 4.2 4.0 4.8 4.5 15.5 6.9 8.6 4.4 0.0 6.6 102 120 120 344

02 1 1.7 1.8 7.5 7.4 4.2 0.0 7.2 4.3 0.0 11.9 321 - - 321
2 7.9 4.7 11.3 12.7 6.3 11.5 0.0 0.0 5.4 18.5 290 462 - 753

03 1 3.3 4.0 6.1 10.8 7.9 11.0 32.8 8.9 14.5 6.6 314 323 - 637
2 3.8 4.6 4.8 15.7 9.7 5.1 6.7 0.0 5.6 7.9 295 510 133 939

04 1 3.5 1.7 5.6 14.4 4.2 0.0 7.3 3.5 4.5 9.4 183 480 181 845
2 4.8 2.9 9.0 12.9 5.6 8,7 5.2 0.0 8.1 9.7 133 234 148 515

05 1 3.9 1.9 11.4 7.8 5.0 13.1 7.6 34.8 4.6 16.0 355 281 - 636
2 5.3 4.3 13.9 10.1 13.8 23.7 8.7 4.4 11.3 15.9 439 417 156 1014

06 1 3.4 4.4 3.6 5.5 5.9 5.4 9.2 72.4 13.5 12.8 467 156 185 809
2 3.6 4.9 12.9 5.5 3.7 22.9 13.7 0.0 3.9 12.9 125 171 262 558

07 1 3.9 2.7 4.6 3.1 4.4 5.6 6.0 3.5 0.0 8.8 331 218 218 769
2 4.7 4.2 9.5 15.2 13.4 4.4 12.3 10.7 13.5 29.5 885 336 391 1613

08 1 3.6 3.0 10.2 10.5 4.5 5.0 8.5 3.7 5.1 5.3 250 162 119 533
2 3.9 3.7 5.3 5.0 5.4 4.7 0.0 0.0 7.3 9.5 142 107 106 356

09 1 3.3 3.9 5.1 2.7 2.9 19.3 0.0 1.1 15.4 25.7 343 136 275 755
2 4.1 4.3 4.9 8.7 7.9 12.8 3.2 3.9 8.6 10.0 172 230 98 501

10 1 2.6 2.7 0.0 4.6 3.8 2.8 8.4 4.0 4.9 9.3 297 178 332 807
2 3.8 3.1 8.0 0.0 7.2 0.0 3.7 0.0 3.3 8.9 113 91 103 308

11 1 3.6 1.1 6.2 10.8 0.0 6.9 4.0 5.6 8.9 8.0 190 299 87 577
2 3.6 3.4 2.2 3.4 1.7 0.0 4.2 0.0 0.0 19.1 79 127 86 293

12 1 5.1 2.3 5.4 4.1 0.0 7.5 6.1 3.7 8.8 10.0 421 252 190 864
2 5.7 2.5 0.0 5.0 0.0 0.0 0.0 0.0 0.0 6.4 105 140 98 344

13 1 1.1 4.4 16.6 16.6 16.4 5.7 18.5 1.9 5.5 27.0 1470 - - 1470
2 4.4 6.2 11.4 4.0 8.9 12.7 11.6 10.6 16.7 8.9 184 182 178 545

15 1 4.3 3.2 0.0 0.0 0.0 5.9 0.0 0.0 0.0 7.0 108 117 84 309
2 4.7 3.1 19.8 0.0 0.0 0.0 0.0 0.0 0.0 6.0 108 127 104 340

16 1 3.0 2.6 6.4 5.1 4.8 5.4 6.4 4.4 5.2 8.4 247 249 277 774
2 3.5 3.1 11.9 8.4 0.0 4.9 5.1 5.0 5.9 13.7 512 206 253 971

17 1 4.9 3.8 10.1 5.2 8.0 43.2 16.0 13.7 8.1 12.7 537 458 201 1198
2 6.6 8.1 7.2 0.0 2.0 6.1 12.7 0.0 6.4 18.1 252 225 167 645

18 1 4.6 3.7 9.5 7.5 5.9 5.1 12.1 7.1 7.6 4.8 160 287 224 673
2 4.3 3.2 7.4 12.5 6.0 4.6 4.0 27.6 8.5 7.5 151 324 197 673

20 - 3.6 4.5 5.6 0.0 0.0 6.0 0.0 0.0 6.2 0.0 102 98 95 295
Avg 1 3.6 3.1 6.3 7.3 4.8 7.9 8.6 10.1 6.1 11.8 345 237 184 700
2 4.6 4.1 8.3 7.1 6.3 7.1 5.5 3.9 5.8 12.1 232 230 158 612

 

246

Table D2: Task T3: button selection timing statistics

Satin-5v: asumbawmmubyﬁm'hwfulOO-100pixdm.m2000 Selim.“ “
20

 

19 Head~E ou

0+

Ch. trial
HMO—E . mm 1-2

 

 

o-uvubmouam

 

 

‘1 0.5 l

~O.5 O
109_2 (DIS + 1/2). "

 

 

 

 

 

 

 

 

 

.. . 1 r .
0
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 I

X scrum coordmu

 

 

 

 

 

 

Figure D.1: Task T3, subject 00, Fitts’ law time-distance plots and cursor

each target

Time in sooonth

ercanooordhma

247

 

..
o—Nuhwmwawoanu
-

 

 

 

 

2.5 3 3.5
Q In human ub-

-O.5 0 0.5 1
100.2 (W5 + ”9‘ ‘

-1 1.5 2

 

 

 

 

 

 

 

 

 

 

men ooordhalo

 

 

 

, . 1U.
o .
o 0.125 025 0.375 0.5 0.625 0.75 0.375 1
a

 

 

 

paths for

Sol hm vs “- tum' law 10! mhl_2001 Sol limes vs nu. law lot 0"me
$3 Head-E swam. 111310 + $3 Head-E Mouth.va —-—-
18 Head-Eye~ cumulus 1-2 + 13 Head‘Eyo- mammals 1-2 —+~
‘7 17 Mouse, trials 0—2 -
1e ' 16
15 15
14 14
13 13
12 g 12
11 § 11
E ‘0 2 ‘0 n
- 9 - 9 .
E 8 E a
i: 7 F 7
6 6
5 5
I 4
3 u 3
2 - . 2 _
1 1 -
0 0 '
-1 >0 5 0 0.5 1 1 5 2 2.5 3 3.5 -1 -0.5 0.5 1 1.5 2 2.5 3 3.5
lou_2(D/S+1/2).Dbaswloebememwacﬂonpoms.53bmlonyze Iou_2(D/S+1l2)."' Gunman
“ 4‘ _ r ‘ mhi 2001.!rla10 ” ‘E, ‘ ‘, mhLmLtﬁalO
1 1
E1) .. ..
o 875 ‘ 7 n2:—
0 75
3 S
3 o 625 .2
° E
i °5 §
§ 0 375 g
> >
0.25
0125 ..... . .\
.. \. ‘ 2'; i‘x
o x: A o ' a '
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
screen ooord'nalo X screen ooordhlln
” “ " mhLZOOI, Ina|1 “' " ' , 'IIhLNOLtﬂIJ 1

Y sateen ooordhm

Y screen Mahala

 

 

 

 

 

 

 

 

  
    

 

 

 

 

 

 

‘;4 ‘

 

 

 

V screen coordinate

 

 

 

 

 

 

 

 

 

   
      

 

 

 

 

 

t.

 

 

  

 

, ’* q . . . i 7‘ . . .
0 > a; A . ~..“ u__
0 0.125 025 0.375 0.5 0.625 0.75 0.875 1 0.75 0.875 1
X screen coordhaia X men mural.
u A r , u-N,_2001.ma12

 

 

 

 

 

 

0 1 ¥ ‘ 5 i ' ' - ‘ i ' »
0 0 125 0.25 0.375 0.5 O 625 0 75 0.875 1
coordinate

V screen coordinate

 

 

 

 

 

/ 1 . '- 1

 

0 4 ' ’ ‘
O 0.125 0.25 0.375 0.5 0.625 0.75 0.675 1
W "19

Figure D.2: Task T3, subject 01, Fitts’ law time-distance plots and cursor paths for
each target

248

 

 

 

Sal nmesvs , Fuu' law 101 subL2002 Sol amasvs ‘ 7‘...‘ law 10! 51113002
20

 

 

 

 

 

 

 

 

 

$3 Head-Eeroum,mal0 - 19 Head-E sMoutthlO —-—
IB ‘3 ~ Head‘Eye' oum,mals1-1 —v—
‘7 17 MoweJnalsO-l ~-
16 16 ,
15 15
14 14 . -
'0 13 13
‘3 12 § ,2 . .
§ 11 g 11 1.
m 10 1o ' , . ~
E 9 E 9 '
E a E e
t: 7 F 7
6 - . , e
5 - _- 5
4 - - . 4
3 3
2 ' 2
1 1
0 “ ‘ 0
-1 -05 0 0.5 1 1.5 2 2.5 3 3.5 -1
Iog‘2([yS+1/7) "' ‘ Qkhminn nun

 

 

 

 

0.875

075
E E 0625
E 5 05
g g 0375
> >

      

 

 

. ‘ _ g , :‘y .l
0 /. . ~ - a 0 , —— ‘ .. . " -
O 0125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0125 0.25 0.375 0.5 0625 0.75 0.575 1
X m coovdhaln X screen main-lo

u . g, ...n,_3002.m11

Y screen coovdlnala

. 4.
3.
31%" '
5

‘0
0.87

 

not completed
not completed not completed

Figure D.3: Task T3, subject 02, Fitts’ law time-distance plots and cursor paths for
each target

249

 

 

 

Sol omens ' ‘ ‘ 1"“;th mhi 2003 Sal timesvs ‘ ”...annor II“ 3003

   
  

 

 

HeadE Mounmalo +
Hoadiyo— cummals 1-1 —.—

Head-E M00111. malO —-—
. Head-Eys— 1h. Ina]: 1-2 ——o—-
Mouse. Mano-2 ..

4......_._._._.~
NubIIIOiVGHDO

Tune In 5m
Time In second

 

 

 

 

o—wao-OVIHDO

-1

 

 

 

 

 

 

Y moon coordinate
V wean ooordlmlo'

 

 

 

 

 

 

 

 

V ween madman
Y screen 000'th

 

 

o ' A ‘ V t
0 0125 025 0.375 0.5 0.625 0.75 0.875 1 O 0.125 025 0.375 0.5 0.825 0.75 0.875 1
Xmooovmn

X .170me

 

 

V screen mum

    

0 . \' 1 g
0 0.125 0.25 0.375 0.5 0.525 0.75 0.875 1
X screen ooon‘ﬂmb

 

 

 

not completed

Figure D.4: Task T3, subject 03, Fitts’ law time-distance plots and cursor paths for
each target

250

 

Setumasv: ...u'hwml “112004 Saturn-vs" ...uwm: whim

 

 

 

 

 

 

 

   

$3 Head-E eMoumJnalO —~— 33 Head-E MamMalO —-—
1e Hold-5w .uais 1-2 -~--—— 13 Head~Eyo- oulh. 1112131-2 _-_
17 17 MousoJrhbo- ~
16 16
15 15
14 14
13 - 13
12 12 '
11 , ' . 11
5 1g 5 13 : . ,
g a - . . = g B -
1: 7 -. ' ' I ' ’- 7
6 G
5 5
4 4
3 3
2 2
1 1
0' 0
»1 -05 0 05 1 15 2 25 3 35 -1 05 0 05
m21D/SM/2).Dusdlsvamaoetmmmpomu.submonun mason/2).“

 

 

 

“ ", r “2004111310

 

 

0.375 _

V saloon commute
V screen ooordhalo
0
(II

025

0125

 

 

 

 

 

 

 

“ 4= ---~3oo4.u1u1

 

 

 

V 301531100010th
V sot-01100016111.”

 

 

 

 

[a

.17“-
O .. 1 - u .
0 0.125 0.25 0.375 0.5 0&5 0.75 0.075 1
I.

 

 

 

 

Y screen 0001de
V screen ooovdham

      

 

 

 

 

 

 

0 2‘11 .; »—-v-’ L ‘\ ,1 0 >1: 1 ‘l l
0 0125 025 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
X semen comma X ween ooo'dh-tn

Figure D.5: Task T3, subject 04, Fitts’ law time-distance plots and cursor paths for
each target

251

 

 

 

 

Selma “..., 13117101 511112005 Sumn‘ ‘ mums»
ﬁg EmerlalO + 7: , Head—E 011111111510 —o-
,3 Head—Eye— m.1r1a151.1 ~k 1a - Head-Eye- 11111131512 ..—
17 17 ' Mousammoau ---
16 . 16 '
15 15
14 14 I ‘ ' . -
13 13 '
g 12 ' g 12
§ 11 ' g 11
a 1
. 2 - . '3
3 8 g a
1: 7 1: 7
6 6
5 5
4 I
3 3
2 2
1 1
0 0

   

 

 

 

Y ween ooovdtnala

 

 

 

 

0
0 0125 0.25 0375 0.5 0.625 0.75 0075 1
Xsctoonooordhlm

INN 2W5. 111-1 1

 

waon coordinate

X saloon coordinate

not completed

Figure D.6: Task T3, subject 05, Fitts’ law time-distance plots and cursor paths for

each target

 

 

 

 

 

-1 -05
100.2(D’5+12). ""

05 1

 

 

Y ween 00““! “1.

 

 

screen ooovdhnb

 

  

V screen MM!”

“\_
'3.
Q 0

   
  
 
     

    
   

     
 

 

:1, 'égs's '

. .1 ‘11 ‘
d. . ." ,
0,125 513%: 1?."

 
 

X lemon ooovdh-tl

 

Y some" 000111111115

 

 

 

‘" J ‘ : .1 » i -l
0 1 1 1 ‘ 1/ ‘ 1
O 0.125 0.25 0375 0.5 0.625 0 75 0.075 1
x ”M m”

252

.1211 3005

 

 

Scltimcavs“ “‘“Iawbr 1"th Seln‘meeve
$3 Head-E mmmlo ‘-— 38
1e “950's” "‘ 1a
17 17
16 16
15 15
14 14
13 g 13

g 12 , ' _ E 12
11 11

n 10 3 1O

5 9 '. 5 9

g a - g a

1: 7 ' ' 7
6 6
5 5
4 4
3 3
2 2
1 1
o o

 

 

 

 

 

Ylaem ooordhlle

 

 

 

 

 

 

V sateen coordhale

 

 

 

 

. gégiggﬁé
«I. §§Hjl3 ‘gﬁw

 

 

...h vs
mg?

V screen coordinale

 

\1‘~ 4,.- _.
0
0 0.125 0.25 0.375 0.5 0.625 0.75 0.375
”m b

 

Y screen madman

Yacreen coordlnata

V screen coordinate

 

. 'mu' law 101 It“ 3005

 

 

 

How-E 011111.1qu o —-— '
Head~Ey9 1h. Irlals 1-2 —«—
Mouse. 111319 02 ,., ,

 

 

-1

 

 

 

 

 

 

 

 

 

 

X moon ooorolnam

 

 

 

 

screen coordinate

Figure D.7: Task T3, subject 06, Fitts’ law time-distance plots and cursor paths for

each target

253

 

 

 

 

 

 

 

 

 

501 lines“ ‘ ”a: law 101 51112007 Sol Imesvs ‘ " 'Iawvor .mgvaoo'y
f3 Head-E e—MouthnalO —-— $3 . Head-E ammo "—~—
13 HmEye— 00111. In‘aIs1-2 . ~-— ‘5 . Head-Eye— "1.141515%? ”...,.
‘7 Mousatrlals 0-2 , +7 ‘7 _ 15015011131302 7"—
16 1s _ - ' . .

15 15 o '

14 - 14 . -
13 a _ 13 . '

12 “ 1

1 . I 11

c 10 - .. 2 10

- 9 u - 9

g a : ' . E 8

1: 7 ' 1: 7
s _ 6
5 5
4 4
3 3
2 2
1 1 1
0 0

 

 

-1 -0.5 0 0 5 1 1.5 2 2.5 3 3.5
loq_2(D/S¢1l2)."' ' ﬁkhmmnnln

 

 

 

 

'1

1.

’3
.

V screen mordinata

 

 

 

 

V screen coordlnale

 

 

 

 

 

 

 

 

Vscreen coordinate

 

            

 

 

 

I .
1"; . . 1 . . J
o 0.125 0.25 0375 0.5 0.625 0.7

Xsoreenooonﬂmb

, 0 '. 1 5i .
5 0.875 1 0 0.125 0.25 0.375 0.5 0.825 0.75 0.075 1
X mm m

 

 

Figure D.8: Task T3, subject 07, Fitts’ law time-distance plots and cursor paths for
each target

254

Seton-05v; ‘Tu'lawbr Inf-2006 Sdﬂtneavs‘ “'hwbr "“ ' Inf-13000
ﬁg HemEmMoum.mal0 —-— $3 HeadvE 01m. 111an —-—
15 Hewae- 111.0121: 1-2 A+~ 18 Heed-E othnalH-Z +
17 Houseman 0—2 ‘-- '7 MouseJrlals 0-2 .
16 . 16
15 15 '

14 14

13 13

12 12 _

11 11
c 1g I - . c 1g
2 a . r a -
1- 7 _ - ' 1— 7 . . _ -

6 6

5 5 -

4 4

3 3 '

2 2 ' -

1 1

-1 -os 11 0.5

Y screen coordinale

V screen ooordhate

Y screen coordtnate

 

 

 

O
0

0

 

 

 

 

Lei—4.12.2

0.125 025 0.375 0.5 0.825 0.75 0.875

X screen coordinate

 

    

Y screen coordhate

 

V screen coordhale

 

 

 

 

' 4'

 

   

  

{if
L4

 

 

 

 

1’ . l . . 1
0 0.125 0.25 0.375 0.5 0.625 0.7 0.075
X

screen coordhme

1

 

 

 

    

 

10L: (01s. 12). "‘

   

 

 

 

 

 

0 4 x 5 1’ I _,
0 0.125 0.25 0.375 0.5 0.65 0.75 0.875 1
X screen coordinate

 

 

    

 

 

o i” '
0 0.125 025 0.375 0.5 0.625 0.75
X screen coordina-

 

 

    

 

 

 

0 0.125 0.25 0.375 0.5 0.625 0.75 0.075
choomla

 

 

Figure D.9: Task T3, subject 08, Fitts’ law time-distance plots and cursor paths for
each target

255

 

 

501. times 1!! .u' Ilwror ' am 2009 5‘. rim-sv- m- Ilr rur I!“ m

 

 

19 HOME e—Mmm,1n’a|0 '0— 19 WEW.WO —-—
15 Heed-Eye ourh.1r1als1-2 mam 18 Head-Eye- .lnalsI-Z —-—
‘7 Mouse.ma150-2 r ‘7 9.111413 77H
16 16
15 15 ‘
14 ' ' - 14
1a - 13 ,
E 12 5 12 ‘
1 '3 .. 1
w 10 ’ ' ' - - 1o ' - ' '
E 9 . . E 9 . '
g a ' 1 E a
1- 7 1: 7
6 6
5 5
4 4
3 3
2 2
1 1
o o-

 

 

 

 

   

b
or
o
a:
u-
re
N
or
U
u
01
o
01
‘
..
or

2 2.5 3 35

 

 

 

 

 

 

Y screen coordinate

 

 

 

 

 

Y scram coordhere

    

 

 

 

 

 

0 0.125 025 0.375 0.5 0.825 0.75 0.875 1
X saeen ooordhare

 

 

 

 

Ysaeencoordrnare

Y screen ooordhale

 

 

 

'\ i 1 \’

0
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
ooordhale

 

 

 

Xscroencoordnals

Figure D.10: Task T3, subject 09, Fitts’ law time-distance plots and cursor paths for
each target

256

 

 

 

 

 

 

 

 

 

 

Sd lame-vs ‘ 'u'hwlov wh- 2010 Sd nnnsvl ... awn ...n 3010
19 Head-E e-Moum.u1a10 + g WE .mo _._
‘3 Head-Sp cummals1-2 + 18 Hun-E .l'llh‘I-Z #9
17 Mome.malsO—2 “—77 ‘7 MOI-OJIUIM >-
16 16
15 15
14 . 14 .

g 13 13
12 12

§ 11 g 11 _

2 1° - . 1° .

‘ 9 ‘ 9

E 8 E a - .

1: 7 1: 7 - - -

6 6

5 5 .
4 d

3 3

2 2

1 , .

o . , . o I '

~1 ~05 O 0.5 1 15 2 2.5 3 -1 -O.5
' Iog_2(us,m)" thw-

 

 

 

 

 

  

 

 

  
   

 

 

 

i‘rx Ax ’ 1'
1 .-

0 r, ‘ ‘ 1
0 0125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xlere-ncoovauh

    

 

CI
0.125 025

 

 

 

 

 

 

1:»

0
O 0.125 0.25 0.375 05 0.625 0.75 0.875 1
Xscreeneoovu'lulo

 

  

 

 

Y screen 0001511!”

 

 

‘ ~. ' \v—‘AL
0 0.125 0.25 0.375 0 5 0.575

 

 

1“” ”10. “ll 2

 

 
   

‘1

 

_. g"
1/

 

 

, * , ‘~ ' x ‘
o x - , -. 1 1
0 0.125 0.25 0375 0.5 0625 075 0.875 1
Xmoncoord‘nale

Figure D.11: Task T3, subject 10, Fitts’ law time-distance plots and cursor paths for

each target

257

 

 

Sal limesvs 'u'lawlu mhi 2011 Sol lumen '1...‘ law 101 ' ---N__3o11

 

 

 
   

 

 

 

 

 

 

 

2
,g HeawErAe-Mwmmialo —-— $8 Head-E 01m, mo —-—
‘5 Head-Eye- mammals 1 2 ‘O- ‘3 Head-Eye- .Irials 1-2 __.._
‘7 Mouseuws‘H - ,7 Memo—2 ...,
16 ' 16
15 . 15
14 14
13 13
12 - 12
11 - 11

c 10 . z 10

- 9 _ - 9 a

E e , - E a . "

1— 7 . . ' 1: 7 '
s s - ' . .
5 5 - " . ' - . . 1
4 4 - . _. ‘-
3 3 ' -
2 2 . “ ' '. . .
0 ‘1, 1‘ J 2 35" v 4' M “a?" -!-‘.'-1

~1 .05 0 0.5 1 1.5 2 2.5 3 3.5 -1 —0.5 O 0.5 1 1.5 2 2.5 3 3.5
1on_2(DI$¢1r2),"‘ ' m2 (015.1191 M gamma. .1,-

 

 

 

 

 

Y screen ooavdlnele
Y ween contribute

 

 

 

 

 

 

a
0
0 0125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xecmenooomte

 

 

“ “ E, , r ~ﬂ_3011.11‘1d 1

 

 

0.375

V ween manure

Ysaeenoocrdime
.0
a E

0.25

0.125

 

 

 

 

 

 

 

 

 

 

Vwewoootdhm

Ysaeen mum

   

 

 

 

 

- 2.1 ~ '1

0 0

0 0.125 025 0.375 0.5 0,625 0.75 0.875 1 o 0.125 0.25 0.375 0.5 0.825 0.75 0.875 1
choovdimw Io

 

 

Figure D.12: Task T3, subject 11, Fitts’ law time-distance plots and cursor paths for
each target

258

 

Sal ﬂmvs disranaemeenbrmormby ﬁns' Iawlor 1002110001er cons. 31013012 Selmmsvs. ‘ ..u'lawmr ' sur- 3012
20 20

 

 

     

 

 

 

 

 

 

 

19 HeaoE aMoum‘malo _._ 19 Head-E s-Mouth. 111an +
18 Head-Eye ounmals 1—2 , .7. ,5 Head-Eye- oulfmnals 112 H—
17 Mouse. mats 0—2 -» 17 Mouse. trials 0-2 -.
16 . 16
15 ' 15
H 14
13 13
12 . g 12
11 § 11
10 " 10
C C
- 9 - 9
E a g a
1‘— 7 1: 7
S 6
5 5
4 4
3 3
2 2 . .
1 1 ’ ., )5.- ,n _|-.__'1
0 0
-1 43.5 0 0.5 1 1 5 2 2.5 3 3.5 -1 —0,5 0 0 5 1 1.5 2 2.5 3 3.5
Iog_2(D/S4 1r2). Dudisrance betweensebwon polnls.$sbu110n size Iog_2(D/S.1/2)," ‘ ﬁhmsbn
u A =, “ , um 2012.1nalo u A =, , ‘ u-n,_3012.mlo

 

 

 

V screen ooordinale

 

 

 

0.125 0.25 0.375 0.5 0.625 0.75 0.875
Xsoreeneoordlrma

 

 

 

 

‘4 . ', umLaorz. 1111111

 

 

Y screen coordhale

Y screen ooordlnete

 

 

 

 

 

 

 

XWOU'I

 

 

 

V screen coordinate
V screen coordinate

 

 

 

 

 

.- i- - ‘3 ' 1 “LI

. 0 .
O 0125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xmencoordhalo Xnoenooordhala

 

 

Figure D.13: Task T3, subject 12, Fitts’ law time-distance plots and cursor paths for
each target

259

 

 

 

Sel times vs ‘ , '11...‘ law ror snhi 2013 Sol limesvs ‘ ”5:: law for III-N M18

 

 

  
      

 

 

 

 

 

 

 

 

2
1g ' MEyeMmm. 1310 w is Head-E o-Momturialo —~——
16 - Mouse,masO-O - 18 Head-Eye—Kﬁoxm. mals1«2 -—4-—
‘7 . _ . _ - 17 Mousemialso-Z -
16 I 16
15 . 15 -
14 _ ' . 14
13 . . . . 13
g 12 ' . ' - g 12 '
§ 11 - ' - § 11 n
t 10 0 1o . " '
E 9 S 9 . . ' a
E a 2 e ' " -
1: 7 ' 1: 7 . - a '
s s _ .
5 5
4 4 ._ '.
a 3 .. -a . .
2 2 - c I _ ' 1 l.. ,'
1 ‘ " ”3%.: P” "3'1"“ "'2" "'
o o -
-1 -05 0 0 5 1 15 2 2.5 3 3 5 A1 ~05 0 0.5 1 15 2 2 5 3 3.5
'01L2 (ll/5.101 "‘ ' thimm nun m2(D$412),“' ’ ’ thmhn s17“

 

 

u .. =, , mhi_2013.1n‘a1 a U...‘ =, , «mean. m1 0

 

    

V screen ooordhate
0
UI

Y ween coordhale

 

 

 

o a . V o - , , \ 4,71
0 0 125 0.25 0.375 0.5 0.625 0.75 0.875 1 0 0.125 0.25 0.375 0.5 0.625 0 75 0.875 1
X :01 W00 X screen 000113111510

 

u 4:, 'v gull 3013.01.11

 

0.875 7
0. 75

0.625

Y screen coordinate
O
01

 
  

 

 

\ \1 \L I“

 

0,125
‘. ‘ \., 1 ',__ A ‘ ‘ ‘ \‘h '5
0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
not completed x mmcoudmre
" J', r RUN ”13.01012

 

 

0.375

V screen coordinate
0
0|

025

0.125

 

 

0 was. 1 L - .
0 0.125 0.25 0.375 0.5 0.825 075 0.87 1
not completed xwmm

 

 

Figure D.14: Task T3, subject 13, Fitts’ law time-distance plots and cursor paths for
each target

260

 

 

 

 

5d Irwin. ' ’17.. mum ' ' suhl 2015 591.me minor III" 3015
20

  

  
  

   

 

 

 

 

 

 

 

 

 

 

 

‘9 Heed-E .mo —-— 19 Head-E Monmulalo _._
1a Held-Eve .Iﬂals 1-2 A“ 111 Head-EyO-muthmtals 1a «—
17 Mouse. 111315 02 ~ 17 Mm. “51.0.2 .-..
‘6 16 .
15 15 .
14 14
g 13 13
12 12
g 11 11 .
1o . s 10
9 . 9
g B . g a
7 In F 7
6 6
5 5
‘ 4
3 a. . .I 3
2 r. ... . .' - ,, 2 . . .
é :yiﬁ’ﬁ. .1! "I-"J‘ﬂJM" ‘Wd ; xﬂ’ . .‘.' ~-~-.—r'« 'vrv“-v°""“
-1 —05 0 0.5 1 1.5 2 25 3 3.5 -1 '05 0 0.5 1 1,5 2 25 3 3.5
balm/$4112)!“ " ' thmlmmn “Lam/5.1m ‘ . cur-mun
“ “. - , “11,2015. mo 1 F, - F ..u 3015.1111110

 

 

v screen ooordhab

 

 

 

 

" 1

 

 

 
 

0 .. . .
0 0.125 025 0.375 0.5 0.625 0.75 0.075
In

 

IIIhL2015. vial 1

 

 

V screen coordhele

 

 

   

 

o ‘ .’
0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
Xmeneoordnu

 

“ ", l1“ ”15.1ﬂ'd2

 

 

 

 

 

 

 

 

 

0125 ”___. ‘ ._l
0 0 ’3 E Q 1 ‘2}.
0 0.125 0.25 0.375 0.5 0625 075 0.075 1 0 0.125 0.25 0375 0.5 0.625 0.75 0.075 1
in X screen ooordnna

Figure D.15: Task T3, subject 15, Fitts’ law time-distance plots and cursor paths for
each target

261

Sol limesvs " '11:; 13w ror sub-72016 5d urnesvs " ‘ ‘ "m‘ 1:qu IuhL3016
20 20
‘9 H8305 94401101.“an ,9 Head-E Mouth. mo —~—
18 Haw-Eye oulh,mals1-2 4'7 16 Head-Eye- oulrmrials1-2 +
‘7 Mouse, Inals 02 - 17 Mouse. 111315 -
16 16
15 . 15
14 14

g 13 13
12 1 ‘ ' '

g 11 11 . .
10 ' ' 1O ' .

E 9 . S 5 . - .

E 8 . a . . . u E a . .a . . . '

I e ' . . --' I. e In

2 . . '.. _ i . _- ‘ ." : .- ”_- °¢-.' .
3 . ._ 3 - ' - '- ' ' .' - r . . A
2 ' . ,. 2 1 7 ,“ {1&T""T'
1‘1 . ' 2, . " ' 11‘3.1'1w= ":11

Y screen ooordlnele

Y screen ooorumte

Y screen coordlrme

 

 

 

-1

 

-05
m2(D/S412)."

 

 

 

 

2 2.5
Q 1n hum-m 1:17-

mhL2016. 111.1 0

35

 

 

 

 

 

 

025 0.375 0.5 0.825 0.75
X I!

..g:,

0.075 1

 

 

 

 

.0
u-

       
 
 

 
   

 
  

 

2.5 3 3.5
' ‘ Q 1: team an:

 

Y screen oocrdhale

 

 

Y screen mm

 

 

1 ” . _ - ‘
0 0.125 025 0.375 0.5 0.625 0.75 0.375 1

 

chfoonmomh

 

 

Y screen ooordhale

V

   
  

- Pl 3.
O 0.125 0.25 0.

   

 
 

' - r; (AVE:-
375 05 0.625 0.75 0.075
X lemon coordhlh

Figure D.16:
each target

Task T3, subject 16, Fitts’ law time-distance plots and cursor paths for

262

 

Sal. arms vs. ‘ ‘ ‘ “ Fuu'hwfor ’ m“ 2017 $01. limes»: .... uwror m-N N17

 

 

  
    
 
   

 
 
 

 

 

 

 

 

 

 

$3 Head—E e—Mouthm'ialo —-— $3 Head-Emmanmo +
15 Heed-Eyezﬂoummxals1-2 ~~—- 18 Heme-Eye- 111.1111151- «-
17 MouseJnaisO—Z -' 17
.
16 16
15 15 '
14 1‘ '
g 13 13
12 12 .
g 11 11
1 1 n
1.: S 5 g
g a g a - . _ . .
1- 7 1— 7 . . -' I
5 5 . ° ' 6;, . .../
5 5 . . . rffw .
4 ‘ ' . ' ,/ . .
3 3 _ / e'
2 2 .
1 . 1 ' ' -
O , , . . O . . .- .
-1 -0 5 0 0 5 1 1.5 2 2 5 3 3.5 -1 —0.5 0 0.5 1 1.5 2 2.5 3 3.5
Iou_2(D/S+1D\" sum-tn
u a', , , mu 3011mm

 

 

 

V 5C1” mull.”

V screen (”01071.18
0
u-

 

 

0.125 ' -\
/ \\\\ ‘
0 i/ 1 ’ x ‘I ‘ ' 5.
0 0.125 0.25 0.375 0.5 0.625 0.75 01075 1
Xscroenooordhme
u . c, ...u_3o17,m 1

 

 

 

Ysaeen moron-1e
Ysaeen coordlrule

 

 

 

*4 ‘P, , mu 3017,1na12

 

     
   

 

150“"
1.——'

 

Y screen ooordhate

V screen coordhele

 

 

 

 

 

. v : . \ . .\
. 1/1 3 \ . .1
°__ ‘ _. rs. . '_ '1" 0 .1 ah” 1'
0 0.125 0.25 0.375 0.5 0.625 0.75 0.375 1 0 0.125 025 0.375 0.5 0.625 0.75 0.875 1
Xeeroenooordhlb xucreenaoordheh

Figure D.17: Task T3, subject 17, Fitts’ law time-distance plots and cursor paths for
each target

263

 

 

Sci tunes vs ‘ ‘ “ammo: ' Inhi 2018 Sci time-vs “ .. 111141101 ' 1.1-11173018

 

 

 

 

 

 

 

$3 Head-5mm. maio —-— 13 Head~Ene—Mwm.malo _._
13 Head-Eye- m.1nals1-2 —+—~ 18 Head-Eye- ourh.mais 1-2 ~o~
17 Mouse. 111:1; 02 ~ ,7 Mouse. 17181802 -
16 16
15 15 '
14 14 "
13 ' 13
g 12 - g 12 '
11 ' 11
n 10 ' . a 10
E 9 . ' ' . 5 9 -
g 5 I. . I g E n c
r: 7 ' " ' . .. r: 7
6 6
5 5
4 t
3 3
2 2
1 1
0' 0

-1 -05 0 05 1 15 2 25 3 3.5
qu_2(WS41R),Disdswceberwemseieuronporms.Submarine

 

 

 

 

"‘ ‘5, ‘ *, mh;_3013.irla10

 

 

   

 

 

 

 

 

2 s
2 g
g s
3 51
> >
o o '- " ‘ ’
0 0 125 0 25 0.375 0.5 0 625 0.75 0.675 1 0 0 125 0,25 0,375 0 5 0.625 0 75 0 875 1
X screen coordinate X screen coordhale
“ ‘ ", ‘ , . --H_201a.m1 “ ‘ '=, r r mm 3018. 1111111

 

 

 

 

Vscreenooordinate
Yscreencoordlnue

    
    

=32 I .I "

o L“; ? 2 5‘1

o 0125 0.25 0.375 0.5 0525 075 0.875 1
XSC'W‘OOO'GN"

 

 

 

 

 

' , mm 301a. 1m: 2

 

 

Vacreencoordinale
Vacreencoordhme
.9
.0 n
01 a-

 

1 ‘ ‘ . §\ Na.

 

 

 

 

 

 

‘1 I' X T E
o 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1
X Iaoen coordinate X screen ooordhale

Figure D.18: Task T3, subject 18, Fitts’ law time-distance plots and cursor paths for
each target

264

Sei bmes vs distance between buttons by Fat: law to: 100x100 pixel icons. Sub: 2020

 

 

 

20 w . . Y
19 » Head-Eye-Mootntnalo _._
13 , Head-Eye—Mouth, tnals 1-2 - - .
17 , MOuse.tnals 0—2 - .
16 . . .
15 t 4
14 > I
‘0 13 4
E 12 1 .
§ 11 . .
m 10 t 4
C
_ 9 I . 4
g 5 . . ;
1: 7 i j
6 u a . D.5 " ‘
5 b e . E ..‘. ”/an
4 h t. . . 1.: FL'Fk'-3f‘_dr."' L . I
3 i c n . D s . “...-..—ehf’."r~'. 'a'I' . . a.
2 b I If ‘ ‘5” 0 . . e 4
1 ‘ M ’9’. .'M If¢"' I .\:IU~“‘I“ I, -l*1
o F: . I 5. U . _ 777— A; A A

 

 

-1 -O S 0 0 5 1 1 5 2 2 5 3 3 5
109,2 (DIS + 1/2). D is distance between selection pants. 5 13 button size

Head-Eye-Mouth pounter paths lrorn button to button. subj_2020. trial 0

 

 

DO
. ..DJ __—
0875 > y . f-‘ < I; 02 ......_.. .
. ,4 a‘- I l _'v 7 K ‘
075 r I ‘ . ‘-. . :47”. ‘1 ”.05 .\
% .\. e , . a". 1’ If \' 4 ’.,J m /
- '- .l ' u ..
c » r~ J" a - ”sf
6 0625 , I-v-‘7.Iad3’f_-_'... -/»« .. bob .. ‘
g o u a 1/ .7 ‘- 1- . .. 09 . Q”
o 05 “F “’3'” 1. "r r‘. -
. N— - "l ,' 4 , 4 ‘\ I;
§ >— 'V .4. ,IV’ .1 . . \ .W ‘ If; . . ‘1‘ .\v .
‘ MI/~‘ I iv— —. ’ ‘K ' “‘.
8 0375» ..I‘ ._J . 1' I . k‘ ’....,. '3‘
> VI ‘\ I a: LI- ‘ 1 L -—O—*—-
‘qép‘ﬁ‘n... e . .H’ \ N. _
025 + {1“ ‘f .7.’ . ’4’. ‘T_. \ 'II / 4
KV~ 'l‘ f j ' \‘x‘ A..—.‘ :f‘ ’. ~
0125» . I, . 1: i
.g ‘ 4!
A A A A ‘41;

 

 

 

o A A
0 0.125 0.25 0375 05 0625 075 0875 t
X screen coordinate

Head-Eye-Mouth pornter paths irom button to button. subj _2020, trial 1

 

 

 

 

  

1 ﬂ 1 ﬁ .. i ‘ a
_ . . no
‘ O OMOTI _.'_A_ b1
0875 1 ' V132 _;a....
075 1
a
E 0625 1 " ' 4
E ” '
g 0.5 " .
9, 0.375 4
> U
025 r 1
0125 .
A 5* A " .2 5 ' ..L' I”). LI. .. _‘l‘

 

 

 

 

o A A
O 0.125 0.25 0.375 0 5 0 625 0 75 0.875 1
x screen coordinate

Head-EyeMouth pointer paths from button to button. subL2020. trial 2

 

 

1 T I, . . ._ .
1. '. rho
z ‘11 ’1! '- ‘ :3 bi —-
0875» .1 > i . F 1* 1,2 ..
1 .1; ms
075 _ - IIIII . _______ Blade; .52 .-..__ ;
2 . §“ _,- I,' \‘I.’ ._ . , , ..
E 0625 .. :*""..-'"~1’ 1.‘ i ”7 i
‘3 ‘ v ‘_‘.-" 1\ . ,' 33 V
g 05 ‘ f 5‘? T—F” 3. - .
§ \ 1‘ “'1- '-
g 0.375 ..-. 1.:
> , v 1,11 ~
025 / " "L .‘ <
.. i; -|..“ I" 5"
0125 I}: u— - 1
1 :*~ \

 

 

 

0.375 0.5 0625 0.75 0,875 1
Xscreencoordinate

o A
0 0125 025

Figure D.19: Task T3, subject 20, Fitts’ law time-distance plots and cursor paths for
each target

265

 

Appendix E

Task T5 Complete Results

Table 13.1 gives the sizes of the buttons used in the experiment. The sizes are given
in pixel units, inches and visual angle.

Table 13.2 gives these data:

0 number of erroneous selections (column 3),

o the number of times the experimenter had to turn off the camera (column 4),
o the number of times the experimenter had to type (column 5),

0 time needed to complete the task (column 6),

0 percentage of the task completed (column 7), and

o the sub-tasks that were skipped (column 8).

266

 

 

 

 

 

 

 

 

Buttons :1: x y measures in: minimal
identiﬁcations pixels inches degrees angle size
Netscape browser 994 x 910 11.93 x 10.27 27.48 X 24.33 24.33
URL entry 590 x 25 7.08 x 0.28 16.31 x 0.67 0.67
URL arrow 15 x 25 0.18 x 0.28 0.41 x 0.67 0.41
Back/send buttons 50 x 50 0.60 x 0.56 1.38 x 1.34 1.34
Close buttons panel 10 x 50 0.12 x 0.56 0.27 x 1.34 0.27
Scroll bar 13 x 100+ 0.16 x 1.13 0.36 x 2.67 0.36
Mail button 30 x 25 0.36 x 0.28 0.83 x 0.67 0.67
MSNBC menu button 90 x 20 1.08 x 0.22 2.49 x 0.53 0.53
Story title 200+ x 20 2.40 x 0.22 5.53 x 0.53 0.53
ZIP code entry 80 X 30 0.96 X 0.34 2.21 x 0.80 0.80
GO button 30 x 30 0.36 x 0.34 0.83 x 0.80 0.80
Mail window 740 x 740 8.88 x 8.35 20.46 x 19.78 19.78
“T02” ﬁeld 560 x 20 6.72 x 0.22 15.48 x 0.53 0.53
“Subject:” ﬁeld 480 x 30 5.76 x 0.34 13.27 x 0.80 0.80
text entry area 700 X 460 8.40 x 5.19 19.35 x 12.30 12.30
Select panel 20 x 20 0.24 x 0.22 0.54 x 0.53 0.53

 

Table E.1: Sizes of typical buttons in the Netscape browser and mail windows, and a
typical news WWW page

267

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

num. num. exper. time % skipped
Subj Sess err turn off types mm:ss ﬁnish sub-tasks
00 1 1 0 0 7:00 80 not done 2,3,4,5T
2 0 0 0 3:34 100
01 1 2 0 0 2:50 84 not done 2,3,41
2 4 0 0 3:24 100
02 1 10 0 1 13:52 100
2 2 0 0 7.25 90 not done 2.4.5T
03 1 9 O 1 12:39 100
2 4 0 0 8:47 100
04 T 10 0 0 7:23 jOO
2 14 1 0 9:02 100
05 1 11 2 0 7.36 82 skip 5,6
2 5 0 1 12:11 70 skip 3,4,5,6
06 1 18 2 0 12:27 82 skip 2,6
2 10 0 0 9:50 100
07 1 10 l 1 15:27 88 skip 3,4
2 3 0 0 11:10 100
08 I 2 0 0 9:34 jOO
2 6 0 0 5:24 100
09 1 7 0 0 7:4 700
2 1 0 0 5:20 100
10 1 19 0 1 12:16 86 skip 6
2 4 0 0 9:49 100
11 1 1 O 0 3:28 100
2 4 0 0 4:41 100
12 1 18 0 1 13:00 40 skip 6,7,8,10
2 5 1 0 7:58 100
13 1 12 1 O 10:16 20 skip 1,6,7,8,9,10
2 8 1 1 17:45 86 skip 6t
15 1 0 0 O 3.58 100
2 2 0 0 8:46 86 Skip 61
16 1 11 1 O 11:03 82 skip 5,6
2 3 1 0 6:54 100
17 1 12 1 0 10:33 86 skip 6
2 18 0 1 13:08 100
18 I T 0 0 4:26 100
2 0 O 0 9:28 100
20 0 0 0 2:40 100
with the mouse 1:15 100
Sum 154 8 5
2 93 4 3
Av I 8.6 0.4 0.3 9:12 85
g 2 5.2 0.2 0.2 8:35 96

 

 

 

 

 

 

 

Percentage improvement: # of errors 40.6%, elapsed time 6.6%

T - these subjects’ task did not include some sub-tasks due to time constraints, but
they completed 100% of their current task.

i - these subjects skipped the task due to network problems, but would otherwise
attempted it.

Table E2: Task T5: number of errors and timing statistics

268

BIBLIOGRAPHY

269

Bibliography

[1] Applied science laboratories. [Online] Available http://www.a-s-1.com/.

[2] Blue eyes project. [Online] Available http : //www . almaden. ibm . com/cs/blueeyes/.
[3] Iscan incorporated. [Online] Available http://www.iscaninc.com/.

[4] Lc technologies, inc. [Online] Available http://www.eyegaze.com/.

[5] N ewabilities system inc. [Online] Available
http://members.aol.com/UCSlOOO/home.htm.

[6] Tracker 2000. [Online] Available http://www.tnadentec.com/.

[7] V. Bakic and K. Miller. Using neural networks to recognize facial expressions.
Technical Report MSU-CPS-98-5, Department of Computer Science, Michigan
State University, 1998.

[8] V. Bakié and G. Stockman. Menu selection by facial aspect. Pro-
ceedings of Vision Interface, pages 203—209, 1999. [Online] Available
http://www.cse.msu.edu/“bakicvel/faces/.

[9] L.-P. Bala, K. Talmi, and J. Liu. Automatic detection and tracking of
faces and facial features in video sequences. Proceedings of Picture Cod-
ing Symposium, September 1997, Berlin, Germany. [Online] Available
http://atwww.hhi.de:80/"b1ick/Papers/papers.html.

[10] P. Ballard and G. Stockman. Controlling a computer via facial aspect. IEEE
Transactions on System, Man and Cybernetics, Vol. 25, No. 4, pages 669—677,
April 1995.

[11] S. Baluja and D. Pomerleau. Non-intrusive gaze tracking using artiﬁcial neu-
ral networks. Technical Report CMU-CS-94—102, Carnegie Mellon University,
School of Computer Science, 1994.

[12] P. A. Beardsley. A qualitative approach to classifying head and eye pose. Pro-
ceedings of Workshop on Application of Computer Vision, pages 208-213, 1998.

270

 

l13l

[14]

[16]

[17]

[18]

[19]

[23]

[‘24]

G. R. Bradski. Real time face and object tracking as a component of a per-
ceptual user interface. Proceedings of Workshop on Application of Computer
Vision, pages 214—219, 1998.

C. Buquet, J. R. Charlier, and V. Paris. Museum application of an eye tracker.
Medical and Biomedical Engineering and Computing, Vol. 26, pages 277~281,
1988.

G. Burel and D. Carel. Detection and localization of faces on digital images.
Pattern Recognition Letters, Vol. 15, pages 963—967, 1994.

S. K. Card, T. P. Moran, and A. Newell. The Psychology of Human-Computer
Interaction. Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1983.

J. Carletta. Assessing agreement on classiﬁcation tasks: the kappa statistic.
Computational Linguistics, Vol. 22, No. 2, pages 249-254, 1996. [Online] Avail-
able http://www. iccs . informatics.ed.ac.uk/"jeanc/squib.ps.

J. M. Carroll and J. Reitman Olson. Chapter 2: Mental Models in Human-
Computer Interaction, pages 45—65. In Helander [36], 1988.

J. M. Carroll and J. C. Thomas. Metaphor and the cognitive representation
of computing system. IEEE Transactions on Systems, Man, and Cybernetics,
Vol. 12, pages 107—116, 1982.

R. Cipolla and A. Pentland, editors. Computer Vision for Human-Machine
Interaction. Cambridge University Press, 1998.

A. Colmenarez, B. Frey, and T. S. Huang. A probabilistic framework for em-
bedded face and facial expression recognition. Proceedings of Computer Vision
and Pattern Recognition, Vol. I, pages 592—597, 1999.

H. D. Crane. The purkinje image eyetracker, image stabilization, and related
forms of stimulus manipulation. In D. H. Kelley, editor, Visual science and
engineering: Models and applications, pages 15—89. Macel Dekker, New York,
1994.

H. D. Crane and C. M. Steele. Generation-v dual-purkinje-image eyetrcker.
Applied Optics, Vol. 24, pages 527—537.

D. F. DeMenthon and L. S. Davis. Model based object pose in 25 lines of code.
International Journal of Computer Vision, Vol. 15, pages 123—141, June 1995.
[Online] Available http : //www . cfar . umd . edu/ "daniel/.

[25] B. DuBoulay, T. O’Shea, and J. Monk. The black box inside the glass box: Pre-

senting computing to novices. International Journal of Man-Machine Studies,
Vol. 14, pages 237—249, 1981.

271

 

[26] I. Essa and A. Pentland. A vision system for observing and extracting facial
action parameters. Proceedings of Computer Vision and Pattern Recognition,
pages 76—83, 1994.

[27] Scott E. Fahlman. An empirical study of learning speed in back-propagation
networks. Technical Report CMU-CS-88—162, Carnegie Mellon University,

School of Computer Science, 1988.

[28] J. D. Foley, A. van Dam, S. K. Feiner, J. F. Hughes, and R. L. Phillips. Intro-
duction to Computer Graphics. Addison-Wesley Publishing Company, 1993.

['29] W. T. Freeman, D. B. Anderson, P. A. Beardsley, C. N. Dodge, M. Roth, C. D.
Weissman, W. S. Yerazunis, H. Kage, K. Kyuma, Y. Miyake, and K.-I. Tanaka.
Computer vision for interactive computer graphics. IEEE Computer Graphics
and Applications, Vol. 18, No. 3, pages 42—53, May-June 1998.

[30] L. A. Frey, JR. K. P. White, and T. E. Hutchinson. Eye-gaze word processing.
IEEE Transactions on System, Man and Cybernetics, Vol. 20, N0. 4, pages
944-950, July/ August 1990.

[31] A. Gee and R. Cipolla. Non-intrusive gaze tracking for human-computer in-
teraction. Proc. Mechatronics and Machine Vision in Practice, pages 112—117,
1994. [Online] Available http://www.diku.dk/"panic/eyegaze/.

[32] A. J. Glenstrup and T. Eugell-Nielsen. Eye controlled media: Present and future
state. June 1995. [Online] Available http://www.dicu.dk/'panic/eyegaze/.

[33] V. Govindaraju, D. Sher, and R. Srihari. Locating human faces in newspaper
photographs. Proceedings of Computer Vision and Pattern Recognition, pages
549—554, 1989.

[34] V. Govindaraju, R. Srihari, and D. Sher. A computational model for face lo-
cation. Proceedings of 3rd International Conference on Computer Vision, pages
718-721, 1990.

[35] J. S. Greenstein and L. Y. Arnaut. Chapter 22: Input Devices, pages 495—519.
In Helander [36], 1988.

[36] M. Helander, editor. Handbook of Human-Computer Interaction. Elsevier Sci-
ence Publishers, North-Holland, 1988.

[37] J. M. Henderson and A. Hollingworth. Eye movements during scene viewing:
An overview. In G. Underwood, editor, Eye Guidance while Reading and While
Watching Dynamic Scenes, pages 269—293. Elsevier Science Publishers, Oxford,
1998.

[38] J. M. Henderson, Jr. P. A. Weeks, and A. Hollingworth. Effects of semantic
consistency on eye movements during scene viewing. Journal of Experimental
Psychology: Human Perception and Performance, Vol. 25, pages 210—228, 1999.

272

[39]

[401

[42]

[431

[451

[46]

[47]

[48]

A. M. F. Heuvelmans, H. E. M. Mélotte, and J. J. Neve. A typewriting system
operated by head movements, based on home-computer equipment. Applied
Ergonomics, Vol. 21, No. 2, pages 115—120, June 1990.

T.E. Hutchinson, Jr. K.P. White, W.N. Martin, K.C. Reichert, and LA.
Frey. Human-computer interaction using eye-gaze input. IEEE Transactions
on System, Man and Cybernetics, Vol. 19, N0. 6, pages 1527—1534, Novem-
ber/ December 1989.

R. J. K. Jacob. Eye movement-based human-computer interaction
techniques: Toward non-command interfaces. Advances in Human-
Computer Interaction, Vol. 4, pages 151—190, 1993. [Online] Available
http://http://www.cs.tufts.edu/"jacob/papers.html.

T. Jebara and A. Pentland. Parameterized structure from motion for 3D adap-
tive feedback tracking of faces. Proceedings of Computer Vision and Pattern
Recognition, pages 144—150, 1997.

R. E. Kalman. A new approach to linear ﬁltering and prediction problems.
Transactions of the ASME—Journal of Basic Engineering, pages 35—45, March
1960.

A. E. Kaufman, A. Bandopadhay, and B. D. Shaviv. An eye
tracking computer user interface. Research Frontiers in VR Work-
shop Proceedings, pages 120—121, October 1993. [Online] Available
http://www.cs.sunysb.edu/"vislab/projects/eye/.

D. E. Kieras and P. G. Polson. An approach to the formal analyss of user
complexity. International Journal of Man-Machine Studies, Vol. 22, pages 365—
394, 1985.

K.-N. Kim and RS. Ramakrishna. Vision-based eye-gaze tracking for human
computer interaction. IEEE International Conference on System, Man and
Cybernetics, Vol. 2, pages 324—329, 1999.

S. Kimura and M. Yachida. Facial expression recognition and its degree estima-
tion. Proceedings of Computer Vision and Pattern Recognition, pages 295—300,
1997.

Jr. K.P. White, T.E. Hutchinson, and J.M. Carley. Spatially dynamic cali-
bration of an eye-tracking system. IEEE Transactions on System, Man and
Cybernetics, Vol. 23, No. 4, pages 1162—1168, July/August 1993.

Y. Kuno, T. Ishiyama, S. Nakanishi, and Y. Shirai. Combining observations of
intentional and unintentional behaviors for human-computer interaction. Pro-
ceedings of CHI—Human Factors in Computing Systems, pages 238—245, 1999.

Y. H. Kwon and N. da Vitoria Lobo. Age classiﬁcation from facial images.
Proceedings of Computer Vision and Pattern Recognition, pages 762—767, 1994.

273

 

[51] R. A. Miller. A systems approach to modeling discrete control performance. In
W. B. Rouse, editor, Advances in Man-Machine Systems Research, Vol. 2. JAI
Press, Greenwich, Connecticut, 1985.

[52] M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1969.

[53] T. P. Moran. The command language grammar: A representation for the
user interface of interaction computer systems. International Journal of Man-
Machine Studies, Vol. 15, pages 3—50, 1981.

[54] C. Morimoto, D. Koons, A. Amir, and M. Flickner. Pupil de-
tection and tracking using multiple light sources. [Online] Available
http://www.almaden.ibm.com/cs/b1ueeyes/.

5] D. P. Mukherjee, A. Zisserman, and M. Brady. Shape from symmetry——
detecting and exploiting symmetry in afﬁne images. Technical Report OUEL
1988/ 93, Oxford University Department of Engineering Science, June 1993.

[56] J. Newell. The division of attention: A social learning approach to the interac-
tion of internet and television use. Technical report, Michigan State University,
Telecommunication Department, December 1999. (unpublished).

[57] J. Nielsen. Noncommand user interfaces. Communications of the ACM, Vol.
36, No. 4, pages 83—99, 1993.

[58] K. Ohmura, A. Tomono, and Y. Kobayashi. Method of detecting face direction
using image processing for human interface. SPIE Visual Communication and
Image Processing, Vol. 1001, pages 625—632, 1988.

[59] N. Oliver, A. Pentland, and F. Bérard. LAFTER: lips and face real time tracker.
Proceedings of Computer Vision and Pattern Recognition, pages 123—129, 1997.

[60] E. Osuna, R. Freund, and F. Girosi. Training support vector machines: an
application to face detection. Proceedings of Computer Vision and Pattern
Recognition, pages 130—136, 1997.

[61] S. J. Payne and T. R. C. Green. The user’s perception of the interaction
language: A two-level model. Proceedings of CHI-Human Factors in Computing
Systems, 1983.

[62] A. Pentland, B. Moghaddam, and T. Starner. View-based and modular
eigenspaces for face recognition. Proceedings of Computer Vision and Pattern
Recognition, pages 84—91, 1994.

[63] D. A. Pomerleau. Neural Network Perception for Mobile Robot Guidance.
Kluwer Academic Publishing, 1993.

[64] F. K. H. Quek. Unencumbered gestural interaction. IEEE MultiMedia, pages
36—47, Winter 1996.

274

 

[65]

[66]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

P. Reisner. Formal grammar and human factors design of an interactive graphics
system. IEEE Transactions of Software Engineering, Vol. SE-7, pages 229—240,
1981.

P. Reisner. Formal grammar as a tool for analyzing ease of use: Some funda-
mental concepts. Proceedings of CHI-Human Factors in Computing Systems,
page 53, 1984.

LG. Roberts. Machine perception of three-dimensional solids. In J .T. Tippet,
editor, Optical and Electro-Optical Information Processing, Cambridge, MA,
1995. MIT Press.

F. Rosenblatt. The perceptron: a probabilistic model for information storage
and organization in the brain. Psychological Review, Vol. 65, pages 386—408,
1959.

H. Rowley, S. Baluja, and T. Kanade. Human face detection in visual scenes.
Technical Report CMU-CS-95—158R, Carnegie Mellon University, School of
Computer Science, 1995.

H. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based
face detection. Proceedings of Computer Vision and Pattern Recognition, pages
38-44, 1998.

D. E. Rumelhart and J. L. McClelland, editors. Parallel Distributed Processing:
Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA,
and London, England, 1986.

T. Sakai, M. Nagao, and S. Fujibayashi. Line extraction and pattern detection
in a photograph. Pattern Recognition, pages 233-248, 1969.

DD. Salucci. Inferring intent in eye-based interfaces: Tracing eye movements
with process models. Proceedings of CHI—Human Factors in Computing Sys-
tems, pages 254—261, 1999.

I. Starker and R. A. Bolt. A gaze—responsive self-disclosing display. Proceedings
of CHI—Human Factors in Computing Systems, pages 3—9, 1990.

R. Stiefelhagen, J. Yang, and A. Waibel. A model-based gaze tracking sys-
tem. IEEE International Joint Symposia on Intelligence and Systems: Image,
Speech and Natural Language Systems, pages 304—310, 1996. [Online] Available
http://werner.ira.uka.de/ISL.publications.html.

H.K. Suen and D. Ary. Analyzing Quantitative Data. Lawrence Erlbaum Asso—
ciates, Hillsdale, New Jersey, 1989.

[77] K. Sung and T. Poggio. Example-based learning for view-based human face

detection. Technical Report 1521, MIT A.I. Lab, December 1994.

275

 

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[891

D. Swets and J. Weng. Using discriminant eigenvectors for image retrieval.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No.
8, pages 831~836, August 1996.

K. Talmi and J. Liu. Eye and gaze tracking for visually controlled inter-
active sterosc0pic displays. Image Communication, 1998. [Online] Available
http://atwww.hhi.de:80/‘blick/Papers/papers.html.

J .-C. Terrillon and S. Akamatsu. Comparative performance of different chromi-
nance spaces for color segmentation and detection of human faces in complex
scene images. Proceedings of Vision Interface, pages 180—187, 1999.

K. Toyama. Head parallax tracking for control of a virtual space: a comparison
of algorithms. IEEE International Conference on System, Man and Cybernetics,
Vol. 6, pages 1—6, 1999.

M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive
Neuroscience, Vol. 3, No. 1, pages 71—86, 1991.

R. Uhl and N. da Vitoria Lobo. A framework for recognizing a facial image
from a police sketch. Proceedings of Computer Vision and Pattern Recognition,
1996.

R. Vaillant, C. Monrocq, and Y. Le Cun. Original approach for the localization
of objects in images. IEEE Proceedings on Vision, Image and Signal Processing
141 (4), August 1994.

M. T. Vo and A. Waibel. Multimodal human-computer in-
teraction. Proceedings of ISSD, 1993. [Online] Available
http://werner.ira.uka.de/ISL.publications.html.

C. Ware and H. H. Mikaelian. An evaluation of an eye tracker as a device for
computer input. Proceedings of CHI—Human Factors in Computing Systems,
pages 183—188, 1987.

M. Weiser. Some computer science issues in ubiquitous computing. Communi-
cations of the ACM, Vol. 36, No 7, pages 74-83, July 1993. In Special Issue,
Computer-Augmented Environments.

M. Weiser. Hot tOpic: Ubiquitous computing. IEEE Computer, pages 71—72,
October 1993.

G. Welch and G. Bishop. An introduction to the kalman ﬁlter. Technical Report
TR 95—041, Department of Computer Science, University of North Carolina at
Chapel Hill, 1995. [Online] Available http://www.cs.unc.edu/“welch/.

[90] A. T. Welford. Fundamentals of Skills. Methuen, London, 1968.

276

 

[91]

[9?]

[93]

[94]

[95]

X. Xie, R. Sudhakar, and H. Zhuang. A cascaded scheme for eye tracking
and head movement compensation. IEEE Transactions on System, Man and
Cybernetics, Vol. 28, No. 4, pages 487—490, July 1998.

Y. Yacoob and L. Davis. Computing spatio—temporal representations of human
faces. Proceedings of Computer Vision and Pattern Recognition, pages 70—75,
1994.

G. Yang and TS. Huang. Human face detection in a complex background.
Pattern Recognition, Vol. 27, No. 1, pages 53—63, 1994.

J. Yang, W. Lu, and A. Waibel. Skin-color modeling and adaptation. Proceed-
ings of Asian Conference on Computer Vision, Vol. 11, pages 687—694, 1998.
[Online] Available http : //werner . ira.uka.de/ISL.publications .html.

J. Yang, R. Stiefelhagen, U. Meier, and A. Waibel. Visual tracking for mul-
timodal human computer interaction. Proceedings of CHI—Human Factors in
Computing Systems, pages 140—147, 1998.

J. Yang and A. Waibel. A real-time face tracker. Pro-
ceedings of WACV, pages 142—147, 1996. [Online] Available
http://werner.ira.uka.de/ISL.publications.html.

A. L. Yarbus. Eye movements during perception of complex objects. In L. A.
Riggs, editor, Eye Movements and Vision, pages 171*196. Plenum Press, New
York, 1967.

R. M. Young. Surrogates and mappings: Two kinds of conceptual models for
interactive devices. In D. Gentner and A. Stevens, editors, Mental Models.
Erlbaum, Hillsdale, New Jersey, 1983.

A. L. Yuille, D. S. Cohen, and P. W. Hallinan. Feature extraction from faces us-
ing deformable templates. Proceedings of Computer Vision and Pattern Recog-
nition, pages 104—109, 1989.

S. Zhai, C. Morimoto, and S. Ihde. Manual and gaze input cascaded (magic)
pointing. Proceedings of CHI—Human Factors in Computing Systems, pages
246—253, 1999.

277