... . 0.6. “111““. h; .. : _ 0 . .... . . . .. .....2. 0.0.0... 00.. .0... .03. ‘5. +7... 014:...5-4ffl 1.05.9250 cfii 26.00%: .. .1. ...- . 0 0 3.6.6.. .0 ... . .. - 8. .00... ... .. : .... . . .0. 0. . . .. 0 . ._ -. ... ... ..- .... .... .0 J. ... . .—...6.. ..0. .mef...i..Mu.'t.000m. can»... . 0 6 .0. 01. . 0‘0. 6.01:: a. a . 0 0 0 .0. ... Q 0. .0.-I: . .00 . 00. .. . H. 0 . ..- .00 .0000. '0 u. 0 .0 .. 0 ..0. ...00. .0. 0 )0 9.0 ...-0. . . ...: . 0. 211.1.- ..6 . ...? .... .. .0 0.. . 060 0.. . 1. 00 0 0 . - ... 00...... ... mung? 3.11.03”. M0 ..0. 1.304:- 00...... ... .0. £0... ”36......30. .. ...» ...." .06 ..0. .0....0. .u... ...-(M. hum-..- .. an." .0.............ur.s... ..0... ...-....6-L L... ..0 . o ... 0.2.3:...1 . J 2.2.18.2“... . ..0... . .. .0.-.1101"... . . ....... . .... . 0. 6 .- .... .....9 : fit.‘:«.h..§.1..t...3 ......0.. 0...... 0.; .. . .....1u...0 02% “1.0.0:... .0» 6.0651,. 3... ..1.6 306 ......0‘. ......6 13...... . .... -0... 0 0 . 0... .. 3.1 o. . ...: .... ...: .. ... 1...... .l...’ .... 1.0.2.696! ..0 . .....2 ... 0.. .. ... .. (2210”. 005.15.12.31: 0.9.. . ..0.- 0341.. . . $00020. .. . . u. 00...? 9.0.0.“? it” . 600 - . .J. .001...-0.-O0.0‘.0 . . ...: 0.0.. 0. 0... .903... ...-.20 ..9. .. 6 . 2 ...... ... . . .... . . .36». ... ..0 . .0... . 0.1!. ... .00 00 0- 00.)0000»l’-. 0-..0. 1....,..'.r0.00‘03.0 13.009??? 0.3 . I'm-.00 .50....“ w... N.0u7.l0u...l I. $.00 000003.016n9 .4 .33.???” 3.3.0.600. 00.0.0 ....0 . . .60 006 .. 60.05”»...060 . 0 :30... 0 00 00.0 . . .6. .- .. 0... 300.006.... ... 000 . .0 .06.. . ; . . .... ...... .P 0.0 60. . . 0. L 0.0.... 0-. . . 0. 0-. ...-00 ..0 0. 0 0. ...0‘360100 0‘0. 0 . . 1.1:... 3......222121 0030.09.930h...0|..0.10.6.:.0.600..:.$-.0 0 . p.10. 6! ... . . 00 . ...013 80.6.1... .. 01.03.01 . 6o ..0. . . . .6: ...: ...... s . .6: -6 .. 6 .. . .. . . c. 20... 1 00.2.2. . ... . .1.0..|..0- 01653th5...«”0\.§..00. .n. 0 0ng0 _ 0a.. . .01 .l' 0.1.0. 9.0.0. O“! 0. . O0. 0 . 0‘. .0001. .0000 0.0060 ...-0 60... .000 .1... . .0... ..r0- 16-. ..0. .0 00.. Q 900 .00.. . . . 6 00 . . . .. . 1.000... .6603. . . .. 0. 6- .10 ... .0 0.. .30 s .0.-1.. 0 .0. ..0 . 00. 1. .‘0060 060. ..’.006 A .0) P . 0... a. ... 0 .10 1.4.0.0001 ..0. 0.... J V (.0... - - . a .0.. ...".100...0... 01... . . L $0.11.... ..0 0. . 6:0 . .0.-.19. . . . 1344-10.. 1. ... .o.0.. . ...... I ...... 1... . 0.....0. 6. . . . ..0. . 0000.: _ .. . . ....u “0.8.0.3515. ...). nut“ .03.. . lOu‘JoS...n0Umk£ . 00 ..l 169 T300316. . . . ...3 .010: . . ,. ... .00 .26.“..3 ... 0. .. .».. . . ..0 130 000....0. 000.. ..0. ..0: ...: .... . .. .. ..0. .9 . 1...: 601. 0.00““ .0 .u-07 .1} - 0. 9 0. 00 .6 00.. . 0 . . . . ... . ... 6 Q- . . . 9 100.0 n .a . . .00 .....0 . . -. . . .0 ‘2 . 0.... . prawn. L 90v“- -1010 a u .. n. ”“1003! }u“.h’v"'.’. 9'0 .0.-HO.CW06600“W0-t1.h.-3 0 < .. 5. Joann-l "Nnuho... 000100. u ..u0h...’ . . ..‘0 .n . 0.004 Q, 0 0.. .uuwflotu. - 6 0H . .nshv. ('9 0 n 0. . n » 0 .. v 0. Ldt0. .. 01.1...“ Q.» “mm: .0 .é—‘Uofin .0“ . .9 ..0 . . . 0 '0: ..3» o 00 09“ 00 . .0 0-,...00-1. ‘0 00 0.0 . .0. ...00 ... V 3 0 . 00. .1. 0 0000 ..0 . . .. . .. .0 6' .. fig; .. 9 l . .0 .00. .f. 0 .0 0 .00. ... . .. a 0 0. 00.0 \3. .. ’ 0. 0. .6 .30. 0 .0. 0 00. 0 ..0 00”“ 10%» Q. 0. . ? um... .h 0‘. 0...?(9. 9.01.9.0 {91.043". . ..I 60 0 0.3.650."- 000000” m.”- 0£éwth09h0006r060 6115013. 1.30. “000000.. .16 "HR-40.00... ..0. . . . 30.0.0 .m.....~ 0 :00. 1.01.A0:... 0, .3 0 .00.? .0 .. . ”.... ......1- . .00. .. 0 "01.. ..u ”.0. 000.810.00. . .1. O .00. 9.30.? J 0..” ..0... .90 0 .. 0 0. 30 .0 .Q0c. . 0.00000 0060...... 00.000 000666... 0. 0 I . 6 . 7 0. - . . . 0. . .13.. .. .0 n - .0. 0.0. .. ...0.. -. . 1... .1 . t o .0.. ..0 06‘ L0. 0 . . 6.10%.” 0. a...” 6... .01.. . W100“. 6a.... 003.00... =1 0‘. «0000.. 0:21:63 .1..00| 0.010"- 0 0 $3904. "u. 009v.0.0..0wn.... . I. 6W00.0u. ”-000 ”6.0”... ..0-"00 ..0” ”.0.-00.0 r... :9. 0.40.00. «row-67.. .0350. 0 0 . ..0“ x .6 . . . r. . a . 1...”. ..0. 0-.60... . o 3..... 000v.1.50»u0.... 00”! 00.1.6.0 .6 ‘QEXQQS . .vruuflr: '03. a. 0..3.0 an 9-0 {0,0159 .W’ 0- ...-00 0 ‘3 60600.1 . 00 0.00,.- .0... 600000.011}. ’3 .0.... 0. ..0 0 ...0. .. .... . . .0 -.- .0. o .. ... ...0 r ... . ..a n 00.0... 0... 0... 9:00-03 50-"...6| .00 0‘: 001.0. ...-0.0 .Q. .6: 0... Al|0Q60 0.0. .0‘..‘0 . . ... 91.0066 . ~00". n0. 6.6 30.9.0.4 | 0. . .0. . ..0 . ..0. .00 .u . . ... v .10 . 00...... .00.? ..0. .- . ...... 0 . 0 ..60 0“. 09 31.1.0.0..- 0130"” fiJQ-i... 0..“ ....0000 ..u. 9.. 4.0. L 0‘”! , . 0. . 0. 30000000.- ...: .- 80.0 E a... . . I. ... . ...! ...L.........180-. .0 0. .. 0... .0. 56.000.03.001. ‘06 . . -- .JW...” 00 0.. .10.... .0. .0011. . 0..9... _. 6.. ..1. .1-1.0.0...|.00-...0.. . . . .. ...-6.3.1. .. . . 60 19.619.30.00 69000- .10- I . 40 N0 600. .01100 9'..Q.,0.0..0.0 ...0 . . . 1-0.. .. ...-0.00.600 0 0.. ..061001. 0. . 00... .l 0608.. . 3.“.JIQI .0...j-Q _ “I. .1. ..0. 0 ..0... :0; 300.0 _ 0:: 0:100”; $0. 0-. 0 0 90.9.5." ..0 .011... 00. 00 ..0» .6. .0 .0 ..0} .000“! 9.. 0*“ . .100 16’ 6. l . n . . I o . . . 0 3.0.1.0.... 4 ...oI-nulfluuqut... ...-3003090.! 0 .060un..fl .. .... . . . Lamontnaounu.“ gran“... 0510.313“ . . r. “...—)0}! . 0V.0|0.004...6.0. 91:20» .-...t...... v .. 6 - o 09: .. L 31:91 0.... ... . 00.00 000.510 l.“1u00.n01.o...09”n4~.nt. .0 0- I. 00‘ . 6.0-0 .0 .. . . .06 I. 0 000 , ... . .0 0 .0. 0. 5.0 600.01.01.09 0.0.01. ..6. .\0 .11... ...?!“H...00..00..0 .0. .116331000 .0 .0.‘ ... 0.. 0.6.6 .1. . . a. 0.. .0.; 006.0 .. .0.. ..0. ..- 9.3.1. 6 ~1.......»Go .h....6 . 9.1.00. 1.! .0 00.. .0 Q o .0. . ..0. - .0....01 .. ... .0 ..h’ .0116“ .0.... _. ...6 «p.14 .u .aur..rr?vfi 0.25.10.10.31”... 04".". .0. 7.0.0. .. 1...! 3.10.0155. . J- 01. “MN-0 .41 ...“.1 . 1.9.0. 00 6.. r0000 . . . 00. . 0600 . 201‘ 0|. ..0 0. 10.0.0; 0 0. 0 .0 . '.“H’ I? '6‘!’.\ angst. 0.!”0 91.9-0HLU : 0"...O0QH 0‘0 ...”.60'0. 050%.... ...“...‘0030.’ 0 00 ..M.‘060 I-looay’nlf 6. I JWQNTQ ..0 0 .. .- .6 . .001 - . - 00000.. 0-0 . I ...0 .4 ....NJ. ”Viv!“ - 31. ... ..0-00 W309»... $0$10600PJ 0.0.1 2 . -. ... ...... .1 1.6-.an ..0u6.u . . . . ”0.0 . . ..ql .u ... “Burl... 1.5 Nfi 3%. . 68d... (“0”..an . , ..1 00 . I ..3 6.. u. :30...) ...... 1.0.. F1 .10...- .... .901... ...00 106.02.. -3005. . . . .. . . . . . . ..... . 0.6.... . .. :25 .0 ...“. ..6060 0.00 . J .36....r32011. 1.0- .411': 01037.. 9.0.1.016 . 8. 00.. 1 ...6. 0...}...— -. :0.- 0 0 .0009. . . . . . . : . , . . . . 5 . 16...... ......616 .. ..0. .0. 1.3.... 0:00.06— 0... .... "Rh; 3.093300% . ......” 506:...21PWf-‘l £051.06. -. . 11.0w! 0 60... .... 3.330.300.3033. . .01....3. . 4.. . ... .6... 3.33.0 .18“. 16...: . . . ... ...: .. I. . ...0 0.00..1. 1%....06 0.... ..gvcflmmh‘. #340. .- 000.10?- ....JJ 14- 0.0.0640... .Iq. 0m 0.6.6... "6.. 0. :09 ”6. A700... ..0. .3970. . .3..- .l . ... I. ... . 0 . ..0 ... ..I. “...! J . .- 1r... . 00:.“ .0.-...u0 0. i .....0. .34 . .0. 0 0. .0 a is. . . ...,ri Q V000319 .007 0. ‘00. o . 0 . .0000 .0) . . 00. 0 . . :6 00.090 .0 .0 .0: 00.0010. “all 0. :1 Li ... 0 .took’0 .21.. .0.-... ..0! b-TJJS. ... .n. 000.0900. .0 0. .-6.... 00,00 ... .06 0.0. .00 . . . . 1.8.0.0-...26.0£Hvbiworuto600101vu~9 . ..hvo. 010.. . h. 0 0.0.". .....u 060. . 0 0 a. 0.0.3 ... 0. 0.! . .610 .6 6.00. 3. ..0 ..0-... '0... 6.1.. . . 30 .- . ..1. . ..0. .00... ...0 00190900360. 0313.31.65 .l.. 0000.09.60.00.“ .....l. 0- 0. 603.0 ...... .- 6-000 .0 0 ..0-.109 06.. . . .. . . . .. . . . . . . .. . . ..0.0... 0 ...-.0 6:00.702. .1. .0. ..0 0 8.0.10.0) . 0.0.0.3..000. .0. a. 0669.00.06.00 0 0“: 30.3.0-3“ 0...}; 0 06.300.02.00. 300000.006 9:0 . .. . . . . . . . . . . . .. .. .0 . .6 .0117. 0.!2 . .0.. . . .0 .0.. 6. 6- .. 0.20. 16.0. 0'. 0.00.1813 30.01-01.11 0... 0.0. 109.14. 1......36 6300.31.31. .0.-.60....1... -.0 . . . _ . .0. . 01169.... ...... 06)}?! ...... . .10 .5006. .. n .00.)...0..0.... 0%...21000L2 0.- 01 0.. . .6. .080 .. 07.00.. . 91351.109‘.. 0- 7“ ... . 06.: {.0.-.1321... .....z... . .....- -.. .. 30.. . . 0 0.. 1-. 0.. an... 009$. .... 1.00.00.63.00 0- 0. “-0.3; 360.014.0003"... .10. 01.01.001.009... v: ...006.|...I1000... ...-..0: 30.0.0... ..3-...; . .-|. . ..6..|........ ...3...) .. 0 .1 . ..0. '0 .0 0.00.03. P0 (0060“ 0 90.00 . 1 do .000, 000 3.0.. 0.0; - P6090 ..0» 0". 0000 .0, .0103: . . . . . . .0’ ... .60“ I 8.0. 0.... ...0-r'l10 30.333 ..0-.00. . .616”... .... .1000300: .23.. ..000 «0. .31 0|.92.....0.0121..6)a’0| ... .. or 2.1... o. .6 .32: .u ...r’..?. . 9.26....) .6: . .8... . . . . , _ .0.... . . 40.. f .. 0.00.1000. 0.0.... '66.” 060 5.0.0.0.... . .....0. .- 0”00 .l 1.0 «00 ...-00.0. 0 06-30-10. 01... 081.001 ’19. 6 0. 0- 06 0 . ..0. .. . ya. .0000 6 56104.0... .. . 0 o . .0. .. ..00 . 00. . . . . .. . .. .. ... c . 60.0... f .0.-..0 .0.- . .413“). ...-6.81.... $05.6.”- id"! 00.0...06... 0. 09.5.... _ .... . 1090.15.01.91. -0.. ....E- .1. .... 0...... .... .. 2...... 9:92.33: 0...)... z . . . . . 2.6.... :5. . I 1.0. “000. 1.0 0910.0. .09” 0 $00. MVP”. 0. .0.. ..00 .. .-.. . ..3- 0. 066 6.. 0.0 0 0.- . .200. 0603.60.60.35», ‘. 0. . 0.6 . . . ~30» - . .... . . . .90... . . . . . . . . .. . .. ...-.0. .. ..A . o 1 ..0. . 0. 6 0 .9- ... 0- . , . 00.. - J... I . .. .. . 0- 3.. . .. g 10 00“ gr. 0 03.num 3..-0%“. 9‘ ”“00 00? 1 ‘0“‘050 COLXNW‘. .. .99900d90‘nh00‘03‘90u 0.00””. «run-“0.00116iflorl . 0 0.000.- 66.’.. n ...}...H‘u-r“. .0'6QW“ .00.“- “0000 .0 1&2“. .6006 110.50.10.00- Q...‘ .29“. Q. O.H..-...wnu. . 0 60.600 .0 . .0. . 9.00"“ 0. . 9 ...-000. 6’ .89 6 0. 0.." .0060. ..0-00 . . . . 1 .6. 0 0. :- v 0- . is 0000., .. .000 -. 0. .. . 0 . ..0. . .l . 0. A . ....o. - 19110.00}. 4001.. 0 ..0 ”009.60 90.0“" - ~010|0000000000090ufl C. 0 ‘00... 0.1 0 90 01 ...‘lu-u ”00.10“". Vanni”. .0000..0.0.¢ku0 .00“ ....»09 o «.0200 0' .0. 0-“ 0000. ..30...—n5.0.6 ...0 . .. ..0-1...... .0 n? 60.0.. .3. .00 n . .. ..0... .0 o. ,- a. .3000. 3.5.06.3»... 1.891100». . 3.3.10.0 .0.... .... 91.0.0. .6 :6... 6 66 .... 0.6 . .. .0.-0.. . ...: .0 31.13}... . . . . .... 0... .20.? ..0 . .3 Z .0810... 460..9"1”1’10k6{w.1100,-0‘0. ...-1 50,300.. ...|...000 ...0 10.. 3.. 001.0 .0 b.3026...- 30 . ..0- .10....03 6 .0 .0. o 0 0 0‘6 130.... . .0. .0 . 0. . 6 3A.. .0.. ... 0. I00 .3006. 01» .1 ‘0. .09 00.00. 00.1.... 66. .0..6 0.0001... ...-0.. ..l..:.- ...-.0060. 6.. ...0. 00. .... ...00I0. . 6000...»? 0... . r86..100..:..0..l 0 ... . , . . . 1000060003.. «0100» 6. 0.0.... .6. «.000 00. .. .6. 03.0 ...... 0.... )0 00 916.00.. . - 00’ . 0.00 0.2.02 20......0. ..00 .0600- ...-00.. . . . . . 6.5- .0 :9). - 0' . .0 £00.00" .0 .30.! 06001... 100000 016000.00 30.6. 066 '00- £7. . b . 0-1 [Qt-00.0. 0.- . . . . . . 0 - 6.01.00 .06 0.1.. 1.“ 000- 9 0000.000.’ 0‘0“. .0 9,063.. 0. 00. .. ...3-06.600 ...-0.1.1. .0.... 06.0- .‘v .230 .- 90,1 0 0.. ‘09. o. 00 .00 .0.. 010000.Q0T-. ... 08.’00~. .10.. ...-“’0 o .966..-.0.| .00 .90 6 p . 0. .0. 9 ..0..- ..6... 00.1.. .. .9105. 0.. ...l0.&...la«.|~ . . .0030. .0 .0 1. :0,- ’0’. o - . . . ..013601. 0.0-00. - ‘10. p. 6.076. 0 .9. 09.56.... 102.0321“ . . $3001.10. .0.. 00-8. . . . . . - .. ... ..n . - ... 00.112.131.13 1...!- ......... 0.... 12.63-63.00! 00060 ..3-000009100. .. 6.)...‘0500 0 . . . . .. 0.0.16... ..0.-. . .0160}. ..0! I .... :- 3.1.2219 01.6.0010. .00 900.06....11-90 00 ... l0 . u. 1.06010060020: .3. .1400. 0.0.. 0.00.30.19.01... ... ...-0 00 00!»??- . . . . . . . . ...6 l. . .0 ...3 00.1.06. .0. 0....0... ..0-0.. ...-$041.00. 0 . .0.-06‘... . ...... .0.... 00.0 600.0 0 66.70. 0000.0. .000 .L 16.0.... {30000.\~.0I.6.000.. ’0... 0 . . .0. (...-..0. 6) ..0-.0. 0 6:...00-6.’ 0... 0 0 ... . ..0. . 7 1.0.1.0.... . . .. 6....‘60. 0 6..) I ..0 .41.: 0.6 . . .0. 1.9.0. 3.0. Z 09.6.0.0."n «......6-.006 .30.. . . . .16., 00.. 06 .0.. .0603... 0-1.. 003910... .01 9.0600 030' ...] 03.7.0.0 01.9363 n u . w . ...“ M50698”. .100! m0...a.».61.01..6.§0.. .. 0. 9100.100“? . 0.90.... ,0 .0. ._ . .. . . . .. .. 0. 0.....00. . 60.0. ..0. 0. 1.0.0... ..00..... 00.1000.- 0...‘ ...1. 09.0.5 M0..‘0uvlnm3.t.f0~ 03$ 0. .0 {..0 a 00.... .2006. ..n.“ . ..1... 610‘: 9.1.0.216. 09.1009. . 10.90.. - ..3-.... 0. .0. . ...-.6 . . . . . . . . . .. . . (..1...06.000.. ... .20.... 2...!1‘f. 6- . 0011.03.05 ...0 03.0 9.151.063.1061: I: K v 1. r0: -9.). 000.... .5} 0.00.0 ..0 .110... .... ...3. .30.?!0500053. . .0....30 ... .. . t , . . . .. . . . . . . . . .000 . a6. . .1000]... 0.. . .0 . 0.9.8.5160. 613000.. 5?. C .033 '0 _,- .6: .0.“..46006100-0. .0....0....J 0 6v... ..1-u0.-.~..000..... 0.0.3..“ .l..0.0.00. .000. 30100.0...)17199 . . . ..0-3... .1. . . : . . .. ..0. .0.. .00... 06...? . 0:00.19 v...-00.-....;|| y. . a; O. 1190...... o. 0 300.10.;0u 06 v .. . 0 1%‘96 0.. 1000.00.00 01 0.0.0 1-00 00000... . .190...- . 6. ...-.... {.00. 0|. .0 0|. . 0.. .60.. 0 0- 00 0 0.1.6.060. . . . - . .. . . .0 . .3 0.00 00 ..0 00- 0-51 6 0.0 0.000 0.00.. 010.030.31.0013 9' . 16......- $3.71... .......fi.110. .00. Wm.”6060..0L .101239,-u00|0.......0_0.1..:. 360.0... 0. . ...... 6 600.103... 302.010.220. . .396 . 9... . .. . .. 2.0.. . .0. ..0-21......- . . .. .0...- .u! 0 0.. 1. . ......1...’ .... ... ..0... .0. . . ..0: . 0. .0. 1... ... ..b. .00.. .1061. 00.. 00.01.. ...: I 123': 06.0.0 _ . .0 6.0.0.; .0 .80.. 00......0fi .. .71 .0.. .010. ..- .:L. .1 30:010.) .06 . ... ...... 0. .0... .... 2.1.3.7322... .. .-.0 .. o . 6.. .. . ....090031005 0 0.0- 011... . ...1 o . . .3: . .0- . ...3... . . _ .. . ...... .- .. . .- . .... .0.-6...! ...... .6 .00. 3.05 ... 009.. . . 0. 003.007.030.1303rqé .l. 00:020 0&0 “90.1.70300’0310000‘r'. .30....00003039110100 . 06... 6’..60.y000.- 00169-000000. .21.}; 0.0.1 . 0. .... . ...\.-.000. 0 3000. r: ..006 .46000000‘... . . .. .. ... .0-0. .0.. u . .00. ..0. 01600.... 0110.10... D 0.1.0 00- 0. 10.00, ”nu-010.00,: 0.0 . 00”. 00 .6010.o1..0 0 00041. 0000.06.00 .. .0 I. 0. 60009000 - |00 0 00.00..- 0 . 050-6. 0. 30.0.0 ..0' - .I. 60. 06 . .0? 00‘ 0.9. . .00. 0. 10-00-0091 00. 6|. 0.. 0.. 0.. . 0 :01 66. .... - ..0 00 .00 6. o. 0 . u‘ 01 . 300.... t 0. I. .00 00.0. ..I- . ...).10. 1,100.305’ .00 0 0 “N. 61.01.31.110... 11. .-.0.. {M030 1... 3.1.3.3....” .1... 0.1.0.0- ...031} . 13 ..0. 02:01.0. 0.0.0.... ......0... 2... ..0-1.1.9.... 0.20 1.35.8.1 ... ...3. 0.. 31.9.1130... 6...: 3.... . ..0: .00....» ..0- - 0303.0 . . . . .. . . . .0. 52.131.00.2100.‘ 06. 610.0...1660 ..0. «0% . L0. .036.- .09 . 2... 0,. . ... 0.. - ... 0. {30.0. 1.60.. .09. 1. 9.0. 0". .101 .0. ...1-0...0..:0¢ .0-... 30.0.3 ...-.0... 0 300.10.. 0... ...3 .. 0610.626 .0.. 60‘ .3611 . 3.6.. .00. ... . .. .... P . . . . . . . . . .0. . . 00.3.0110.” Jug-«[460... .11 .... .- 0.0.3.0310 .0. ... ...:H‘ 0 0.0.. ...!- - .5136..06400 000.... 0 05 o .0192... 00’ 0 .....0.....0 . .. “V ...-1.1.6.6000 ....L 0.0....» 6.. .0 0.. 60.00. 0.. .. . 90......0. ......1. 1. . .10 0.01. 0 6......1-0 -0.0..0 . ..3. .... 0.0.30 0 ..0 ..0.» c 10.00. 0.0". 0 ..‘o. .16.!‘310 90.000.00.100. I19... 0. ..0 .60 0.. . .0. .35.. 10000.6. . "1.... 0...} .000 ..0-0 .16." 0 ..0-0002.330 £060 .61. .. 021.3350, 0 0600 ..0. .... I. . !30.|-.»I.0| 0. . ... 66.10.400.100... ...-60.0 0.6 :11...- - .. . .. 6.. . 0.00. 0000 .. 0 09.. ‘00-: 600 90!.0: ..0-«000000.. .00 0 190.010“ .0. 9.....000. - s 0‘ 000 90.3.5309. .. ... . v k. . 061.65.": .. 0 01.00.00.109 .009 0 ..0. . ...-..0 .. ..0 . 0.... .. 6. 0. : 0. . -. 6-. 0. .. 0 ..0... . .0.- ...... ..0 0.. . . ..0. 09...... ... .. ..0 . 1169.....11850. 00'... -.0. 0 .9 00. 000. 00.‘00 ... Hf“ 036‘.“ 0040000- |0 0'. #0.; ”#106 “.6... .0.”- .."JI 0.00 1-00. 0.0-“. '0 0.0001 0.. 0600".- 0§:00 L. 0.1,... .0 (00 N0 ..0”. Q0“. ”6-4., . n .. . 0-9Qm0l0 0 0 .000.‘-.or.. .. .0 . .9090”: .... . . . 0 ...00900.10 ..‘0 0 ....0 .. \- .0KLQ 0 ~06 ..9”...Q0..Q.. 000.60 9.0.. ..0. 0? 010,140,010... 000.«...0...-000..01=.’..01 .050... ... .0 0.3.... H 00 ..00...‘.000.o.0. 00000.10... ...-:7 .Q6 .‘h-on'...’ .. 6(1. 3.0.0010 0-00 0. 1.0.0.- .0... 000. 0.9 .. .0 3:10.. .00 .. ..0: 0. 0.311.200 ...-...). 9.31.20 .00..» . . ..0 00.00....-.0...0.... .0. .00.? .I... . 0. .20.. ....I..~..0. . .00. . 0. .0.: -001 .00 .00 0...” 00000. .000. ..u. 0.00 .....0“. . .1. .56. .600. .0. 0 .0 . - ., 0.0 -. 10600: 6.. ..601.r 66.000... . ....0. 3 .10- . .1 0.000. ..0-1.. .3. 60 . .00 ...).u’o .... 2. h 6.. . . I. .60“...0£. I 0 0.0.0 0‘.’-|-.1.0.0 w 006 7 , 1-00.. 00 0 .0 0 .00.. . .1 .00.. ..0-0. .10.... 6.. 0 10‘ 0... . o. . I. .60 0000 0 0.00.2000 0.6.. 9.000, 00. . 0 . 9. ..6 . 0. .6 9.000.. .1700 0.000 .. 00000. . . .0 . 0. ..0. 0' ._ 006 ... 0.0.6 ..7 l . . i... .00 000 .- .NI. ..0 .05 ..0. ....0. . any-0:. . .00..r0..|6.......0luu0.0.13.09 .000 L36 ... «0. . . . .0- . 00.007...“ .300. 60“.. .91....0 ....n 0.. $.60. . on 6.0.0.??? 0 . .0 u .. .... 0 .00 0 .0 0.0.1.... . 0.13.50... ... .- :30.- 210080. .0.-060. n. 00:“..0 ... .... 0. . . ...: .. .wuuv... .. 0.]: 200.0. .0 ”0-4” 0.60.13.60.1-9 I... . . .. ...! .0 1:3! .. . ... ...6 n ..0 0.0.1.0..0...-..0..0-.. . . .... 0 ...-I. .. 35600.13... .6-1........\- 0. 00000 0.0““: ... . 0.6. 30 . - .. .0.: 3..... ...-.0360. . .000 .2‘ .9 .. .0 03.. ... . 0 0- 000001. 0 ...00. 0. . . 3‘70 6001.000 0.... ..0-0.. ..0 .00 00 500.30.. 0-. ..3. n . o ...0 . .631... . 00. .. :40 .010 0 001 0. 16.1.76 or ...0 .0 ..0 6.... 70‘. ... .....0... 1.06 5| . «daft-0.60... .t. 000060........J... 60.000. . 0 .. ..0-0.6.5.1300on ... .0. .0... 3...- .... 6.0.1.90 . 30.9.5.2...20. 0.. 0 0.0.0.6. 3.020. ...3, 3.9.0.2192. .. - ....0 ... 6.... . 1.1. 0 .03“ l: 2 9.0.6.0. . . 0636;196:010. .0”. 3..-...: . L. 1 6.. .1... ....S. 0...! 0 . .... 0.9.23.1. . _ .0... ... ... 06.0.0.1! . .. .- . ... 5.0.5 .. 3631... ...-0.0.0:»... 1.0 .. - ..0..I.-..0 .81....£.-.. . 0.. 0. . 3... xs......7...l¢... ..0- ....-. .00 ... ... 2...... .. 3.... .. ...v......0.0...6. 0. 0.0 .2211. . Ghana; . ..0 ..I .0... L ..0 5-0.. 0... .60. 0. 1. 0.203.... .....6. . ..3-0 ... 9.2 “g. .0. ...... 3.3.01.3! 0|. .. .. .. 30.2.1... 7.66.0.1 . . . ... .. ..6. .6.- .. 6 . . 6.20 . .6- . . 0H 0 .-:0.rl 4000- 40. 00.. 0.0. :0 .0 060.33.. .Q. ."9 ..0 .01.... .0..“90000. .”0.0.‘.0. $00001... 000‘. ..na 6... . . 20.6 00.0. ....V 6.100002... ..0.0.. .... .0 ... ...01‘0301’09 ”000:.060000... .firiv- 010i}; 0..- .600. .3 .... 60.1.6.6. .. 0- ..0 . I03...6.0 0. 63. 0 > .... 12005:... .. 3.....0 0 . - 0 .0. . . ... . 2 .-.. .0....0..- ... .0. 0 00 . ......1. . ..0? . L. 6 .0 70.000.300.630 03.0.01... . ..0 .11 10.3.1900 .0015 . .06... 0J0...0.¢0. 2090.010... 0.. 1....»‘00. 0o «000......(000. .0000... .0 0...: 00.. .. 6.0.0.3639. 6. .....30r’...06.003..0.. "000. ..0. 0.001.000. .10.“ 1.0.6...) ..3...0...000..0-.90.-.1.11< 0. .000- . . . ...}. ... 0-9"... .. . .110... .... .0- 1.01....5 I: 0061.00.00.360flha 0.. .113”, 0.003. L030) 120“...- .. 1.0.0.. 0M3... 6 . 0. O. 6.060.135. 4.0. 7.00. 3.00000.- . . .... . 0.1.7.0 0.1.00.- .0 . . 0 0-6 . .0. 0.00. 9. . 0.60.0.2: . 46. .. ... 3... 3.1.0.. 60. . 0... ... 1|.»0000na000..0| . .0": ““0350oudtowlga.’ 0....I‘nga - . .0 J. 3000. . 6 O3, .30.! 0.6.. .0911... 900000. 0. V “00 .0. 0.0. 0 0‘0. 50.0 ...0 0 0103.3 0 . 0. .0.-00.: . .... 0 0. 60.0061... 0-0 . 0 00 0 0060 ..0. Q ..0 «1 ‘0 .00 . . . - hr .1136...” ... 6.. .- ““30, 06.0”.“1.w0..’l60- I . .. .1 0 .111”. ..u. Q... . 01...“; .91. 10‘ L06.- 00.. ..0... .00.»!06». an!” .0. In: «0.0 - 00410. . ..001000000. .1. .0 .306 . .901....._... 00 0....”0750501096-000033 g? 10010110. “-0210. . ..‘30..0 7000.21. .. ... .5. 0- 10:09.10. .... 06.0002L 9 v. 0024-, 01.0..1..-...06-. .350. .0 ..0... ..0001: 0. .. ... ....- .. .10.» | 0000-00-31.... 00:0 .. . . . . -0....906 ...3: 0 00 . .0.. 17.. 00.06. ... 6.1.6.90 . “0.00009 ...-00. s... {5.0.9.00 ha0‘0.r.1011§g 06.00. .1.- .0- .0- 4. -. "00000 1. .0... 13.00.260.01. If: .. l- . .000. l. -.. 6.1000. 5000.... 69-630 .0 . .0 . ...-000 .00 ...-.100. ....0. 001100. 0. 2.01031 106000.010... . . 0. . .0. 0.. 0. .0...0. . q 2...... . 0.0.00.0..00 .0. l . . ,. 000.200.00.02 0.40 l. 0.00 9. . 6 0.9.10. 2.3.0........0..... ..) 0.; . 2 .00.... .... .0.... . . 206.0... . 4.. 3.....09..V|-1|.1 ....0.60-. .... .0. ... ...... . ..0 12.6.68 3.0. 0 . $0.. ...0 ... 0.... 00.- .. ...! 10.1.0... 0. . . .0.. Q ...!016JQ1I 0:0- ...\ .0. .. 00. 00.00.. 30.6.00 1 - 9.00.66. v.00; 00300902001600.1300 .6 .100. .. . . , . . ... . . I 0 0 - 0.0-0.0 . . ... . ..0 6 L .00 ‘0. 50. -. . 0 16.0.0 .000- 60.. o ..0-0. 0 ...00 .. . .0. -.Q0 ..0. .0....) 0 001‘ 0.00 0 .‘6- 0060.... . . . . . . . . 100... .0-00. ...-0 .01.»... . - ...0 0 . . r... .. . ...-t... .0..r.r-..l 9.50 ..... ..0 0. ......I... . . .032... 03. . .. . . . .. . ... . ......0 6.1.0.5.... . Vii-1.3.0.1. 0.. . . .0 v 025. ..3... .|.U:.t.0.0...0‘10500........0. .000. .00. 1).. 0.01.0600.‘ .0....1... .00.]..0u0f . . . . . . .00.. . 1 o .0 26. .00 9. . . . . 1... . ..0-1. .- ... . ... 0L0 . ... .0.-.009. ..F.$0 0.36... . . .10.... ... 6.. .0 . . 0.6. .. - ...6 ...-0C4... 00.\0. . 0 20.0.- 0 Jon-.0 ...-..0.— 1.1.3.0..0.00).,6011’0‘.6...0£ 0.00.0 H3 .. .00 Q... inn“... ...3- ..I. . ..u - . 0 . . .wn‘ . 1L... I. 30“.: . ...-n0. . .00 0.0. 3.10 .60.. 0‘ “10.16.60... 60 0.0 .000». 0.30. 0 2.0.1.1.... 0. 0 00. 0 WI].0.0003..D0| .....0.0c..v.| ’00... .0 .0- . . . ...!0 .0 . . 0. ...-.1. .60 .000 0D .0000. 0.70.0000 .1.0 .0600. 0.00 v .Q. .0 - 0 . .06.... .00. 6.6.0.: 0 .0” .0... 1 t .. . I. 6.. .. . ... .. . ...... 6 0... 9 ... 026?... 0.0.» ..0unh.-..1002?1. 61.5.2.0; .JO) 003... . ..| . 1 I ..- .... . .20 .. ..6- .30 v . 0-... 1L 00.0.0...6 0. .d .J'. . . 60000 .36 0.: . 06.3.3235- ...0 6.0.0.0..‘030. . 0. \l. 0 .60. 0‘ 08.7.3... 0.1 0 10. 6.... .0. 3.100013%. . . ... 0.....60.60..00K.6 . .022... . .1236- 0.. 06 . ..0600. 10.0.6. ... 1.....0. 36.00.10.9051 6.-01.o?.60......11.-...!. . .. ...... :0 .1.... 2.0.. 30.6.0330.- .. ...6 00. -.. ...0- 0 0 ...fi- .00.... 100. ...3}. .0. 0.. 1.0. ..0 0 :0 .. 3.1.. ....04..k66.-..-... 1.3:... . . .61....- . . 6- .. 60 91.3336. 0.7060 .....- 0020-... .. ... .0... .. ,- -:- 1.00.3.0. 0 .2 00.19 .... ... ...... .0130? .10? ..3 $0.. 0.006...- .000-7. . 9Q . . .0r 3|... 0.. 06.060. .0 46000 . . .0. . .0 0.100.000. .hn.’ ...6: . ... .0. 1.00.311. 21.60.... . 300.. .00! 2.10.0- ..6.6(.0|O.... 0- ... 1 . 1.0. ..0... ‘0I . (0 .-0\ 0611000 .00.: 0.0 0 . 0- 00.01 .. 60 0.00 0; ..0. . ..0 .06... .0 .0 6 u I 66.100.107.019. I. ..0. ...‘0. ’0-0 ...3 0 0....Jo....o-.\60. 10 64 .10... .0’ o 00.6. 31.660 06. .64.. 0. .1 0. .0. - . , ... .. . . H . 20.0.52... ... .61 ... 0- .461... . a . «20.0.0.- 0. . ... 0~0-0... .6. .799... 00.0...0 . 4 .. .0060. 4.. 0. v .. ...1 .01.! ..0. ..0. o. -..3...- 0...... . . ....0. 0010.00 ..00n0; 60.1.0 3. ..0. 2|. ...L.. ..00. .0. Tu. ..0. 3..-.01.... .. t0. . 0. 00.0.00 00..- 0' 0.... - ... 0. 9.10... .036 0.....- ... .. 9.0-... ...-.... o. .. 6... 6'0... 0 .. .. M0.62.0.6... .00. 0. 6.. 0....0- 0.1.09. 3.1100....m.01002.......0-60..0..6 .... 0.001... .‘.l.0 Q-..' .0- 0 9.600066 . .0 .... ..0 .. ... .. 0 0.02.... - 1.60.0 . f . .6 .60. .... . 0. .0.-.1096 . . . 20.3.0001. :00...1..10...0-..60000 0 ..0000-0. 1‘. 0 (F6. 0 .3v.......v.1. .125... .. 2... ........|.-. .2 0 6.. :- 0 .. 0.0.!- 329-05.... . . . . .. .0.-9... 110.... 1.0.6.... 0.06: .. f- . 091.200.130.- .90. . ~. 0.. 000.. I0 . 90 609.0000 ‘0 0" . 01 0 0. ...0. - . . 0000000 .Q ..Q 0 .. . :00. . . . .... .9. ..IQ6000. I ‘0... 1 . 0- .0000 . ...0- 0.6.0.231... o. n .....0l0 .0.-.1...!0 . . ....v... . 6. 0.....- .03... .l. . . . .....1. “.0 0 ”110.302000...0 .0006}. ’1... . . -. . 00.. - 04 a .09.. 0 . 0.060} . 0-00. 0.0.... ... lv‘lldufiuso I: ...Q .10 6. 1000-0. .'?06 —.0 000‘ 0;... . . .66 0 0. .0 ...0'0 . l 11 1.01660 . . . . . ._ .. . . . . , 00.6. ..N .. .0.| . 00. 00 0-3.»..6009. .3100319 00.! 8.0.9 .00 ..3 5:60.139“? ‘8 ...... 301.000 ..0 .06.-.. . . . . . . . . . . . . . _ . . . . . . 6.. .. ..0. 0. 0 .v ...: . .. 00. 0 1. . ...3-0.9 .0.-.0... . .0.. ..0-6.0.0... 60-00.:‘000-1iofinuu 619000-1113. £91 . .2...0. ...-10.0 .60... .. . . . . . . .. .. . . . . . . . . . . . .0....8. .. ...0. 69“.}; . .03.... ...: . .10....00... ... ... ... 0. ...‘PW-A..600.:.7 VI ..1 6 6 n-r . . - .. 01. 1 19-. . . . .00 ... . . -. . . . . .. . . . . . . . . . . .. . . 01 . .00 006’ 1.00.0.9.- .. 00 0‘ .0010... .0 . 0 1.00.! ‘0...- 0 .0 .10-01. 6.0.."r II. .000! 1000...! ..3-‘0‘ ‘ . 0-. 0 .. ..0- v. 6.3.00. 51-»...- ...-0...).-. ..0 . 0! 0.. 00.6. .60. . 2.! ...-0 .013 .. .12.... 3.2.20.6... .. ‘61; 0.53. 1.133.306. ...000 .. . _ .1 .|0.0.| ...600 0. . .0. ...1. ...0.'....0 . 6.15 .6. .2063... \9001003510.00%\|2.’190 thv ‘0‘... Q03 .0 3.0.0. . .1. 1-... 0.. .304 . .... - .91. 6 ..0. 066. . .... .0.. .0 0-00 . 0. 600.3 .060 1. . .. . 10.0.. ..0 .0 .... .... 0.. . 21-3-17 3... .0. . .0 .0 .0 3.0.“. 3.10. .0.. 0:. 5:0 . ’6 1. 0. L 0.1.. v ... 1.. . .00: 60130310300000". 90.100.301.00. ,.I 32.2.1906! .. . ... ....J 01......&.-0o:.i.0.|c ...-...... .00. ...! .90.. ..91335 .. ...-..0 ..0. .0 020-? .1000. . ... .. .64.. ...-...}...60 .. ..C .....1..0...0..l... .... . .00 -. . 03.6....119760... ...... 10.0 . ..0-0 00.6. .0 :00. 6.6 30......9000.0 . n 00 .: 0 0. 1.13.0- -...-.- - ..00- 0 66.. n . Q ..0; .. 1.6.1.0.... 0.1010. 0 . 0 0... .0.. .00... .0. .0.-0L: 0. . 63.6! 0.00- 200. . .0. 9.... 0.. . ......9 000.00. .00.. .0 02...: .. .0. .0.. 0.01M.... 0. 400.. .. . ...... 00. K... .... ...0. - 00.1.6.6. 0.7 .. 01 .. A 1 .0 I .0 0.. 0. .0. 0 0 26.0 19’ ... ..0- ..00. ... .6 . a. 060 0c. 19......0 . 1... . . . .I.‘ 9.6.20.1! v. 60.1.. ..0 . . .1263 ...-96061.11... .... .... .. ... v.- ....b1.06.!0..0..|- 006. . .. . . ... 0: .1D6.. . 0.3.0.00217: - ..3. 0 ‘60... 000 0... I. ..6 0 .....6 0Q .0. 06 50.1. . . . . . . . . . . .. .. . 0 6000 ..0.- .. . . 6 60 .6... .... 6 .0. .0 . . ..0-00190000....t0 301.001.000.000 ...: . . . .. . .. . . . .. .. 0.001.013.0109. . 3.3.110... ..-.... .0.... “0.000. 000...} 0 - 0.... 33!, I 6 0 . 0 000.4. 0.000 . .0 8.0 .. -00. ..00 0 . . .... .9 :60 .0. ..00600. 00-0 .9. . . ..0 0... . 0. . .060 0.0.70.0 .. ... . . 1.0.21.0...) 30.01.13: ... ... 00 .0. .... .0. . .. 0600.0. 3.0021... ..0096‘.J.|... 0 60.. .0.-0.6.1.0300 Q1109 0. 6-. - ... ..Q-uor‘ 0 1.1. .00! ..0. .0 ...IQPD-ot. ,. . ..0... u. ..l...0. 0. .. ... - ‘0. 6.6 0000 0 - ...-.0 10.0....- .ioo‘.00.00vl£0$\‘06|1990.f008 0.00.0638”. . I . _ ...0.60.a00.0-_.1 . .l0.|06 .. -.Q. 0.00)... .390. -..0-0. 0-‘00 -Q. . 0:. 0. 060 s 0 . . .0 0. ~ 0.. O. .0. .00 0 . 1-0 .. 0000. 6. ”0. . .0... ..‘00606. Q0'10...I0 160.010.01; 0. .0.. 0.0..|. ’Q-v. 003.130.150 .Q...000-7.u . .006 - 0....|....o. 00 60...... . . 1000-00-0 6,. . 0000. .. .021: 6. ... 1.1.... 00.. . . 0.1100. .00 ... . ,- 0- 9. . . .66 - 6. .030_0....000.0.0-0< 00.7,6060 . . . . . 0 .00-0 0 .n0 0-1 ... 16-6. .. . . . . .60.. 0.. . 3.4.11. 16.....- . ...-..0 ...6. 0.... . 90.00. . . ... . ... . a... . . . 0 .600 ‘10-..016) . 00... 0030 0rl0. 0 3‘ .0606 V0 0 106.9992! . ... . .0 0. ..0... ......70- 0 . 209.... . . . . . I... .~ ..0 .0.... ...-6.0.. . . S 6.000.- s. 6 .0 00.. 1|. 0.0.5.6. .1004. 01130.1 . .. ......2. ...-......t... ._ ...... ..0. ....03. o... .6 . .01... .00... .... . ......083..\-6.6.l.0000..60.60.|.. 11.00.060.916...- .. 03...!!13t01‘ . ... ... : . ... ...3-P3131 .... . .51... 0.0.10.3. 3.00.1. . .0.-0.000001”- .110101 16.0. 6’00 )0 1‘6 . . . ..- ... . .... .... ....- ...... ... . . 03.1.3.6: .3091}: aka)... ...3-I031 . .u‘. ..0 ...3-.06 . . ... 2 : M6 .... . 0- .0.1.016. 000.- 01.. 0 0600. :61... . .... :10... 3..... 0.. . . . .. .0 0.0.0.919? 01199-010000 .. .1 . .6... .01.... . .. .0400... .. .0... 0 6 6-. 6. ‘0... .... ... “...? z- . . . . . . .. .... 097. . . .. . . . . .. . . ..9 ...?90220 002...; . a. . .61. 0.0.1.0 0 910...: . . -.. 01 I ..1 I 0 0:080 .. . 6 . . . .. . . . . . . . . . . . ,wd. . .. . 5.900.111.10..00Q- .0.!9.00028100r|.t$9 .. 3... l7 , L o. ....6. . . . . . ...: .... . . . . . _ . . . . ...q ..0. . . . . . .. . .1003.- 0 ....IA 3.3... 09.0...3.9..10000. .10; . . . . . . .....T . . .... .. . u. o... . .....2 -. -. . -. . .. 3 {0160.0 -610... \....£0.l (.0. .0.-9....31106993 ...-.0600 0 0 .9 .0. . . . 0 . 0.. 0 . . . . 5". .0....-l |.03 0.01.0.- 10.¥Tl.' ‘10.! 1530;000:0060! \.. 0.6-0 . 8 .. 0 - . ..0 .0.. .0. .. . . . . . . .... 60. . . . 1101 000. 0.0...0. 6.. 0 1.: 0 .00- 00... 1. ...0.. l ....0 . . . .. . . . . . . ...... . ..k .... .. . . . . . :P.’ 3.0... ..... l1....h£.l-«nu-00.00F9. 0.30.00.66.06. i . . . I 2.0.. .. ....Hasn 0.0.36.1... .010..‘0£..‘0| .0...|-6|.6|0¢..0u ‘0. .000... . . .0... ..000. .. ”g. 0 . . .00.?! A “0:6 60. #39,}! ..0". 300001301: .- ... . 1... 26.. 1-0... . 6. .7? . .0306). 20.1.1. 0.0.0.0...1-5. 09.! 0.3. 00 .01,- 0.90”” : 6 .. -.. . .. to... .. ... .... .0. ... .0... .. .51.:(0..0008.|06:}I0.0|.f 3..-191.00.103.00 00 . . l 6 300.00.006.2011-‘0‘ 3..-00.0.3.1... 0030-11-00- 0 0- - 0 a. . .0 . 90‘... .-.-Q... . . . . ...- ....1.Q0.0 .00000. ... 0n.- 0 D . f 0.10 Q6. 0. . 0.0 0‘ 0. I.- [0.20 0 000. I . c.3393. 0.. 0-0}. . ... . ....0’2 - .....S. .. . 76.3-6.0 09 a 0 00010.00. 0 0 0001.0..l. 0. ..Q Q .0.. . . . .. 00 ...-10.0000! . . .1 10.10 0. 0. . . . . . .. . .. 0 .009 . 00 . .. 0 90:36-12... .0 00- 0 0.0. . ...... . ..0 6.0!. 001 0.0.13. 0’ .0 : 00 ..3. . .thuo ... 9 ......I ..Q- ..0-0 .6..I!0.\.0 ...7. . Q. 000.1 v.0Q0.. 6.0. .0 .1900... ...0. . ... . . ...0. ... ..0 -|.... . .‘.0 .0 ..I 0 .00.. 6.0.. A- 0... 000 1‘0. 00. u... ....6. .0..00.‘00...00.. . . .00.6|... ....06006 10.1 0 o. -|. . . .0. . :73... ... 3. .. . . L. .0060... 0 00......» ...J I... ... .0. .30.: 000., .... ..0... _ .0 ..0 060 ..0: 9 0. .30..V0-01-. ..l. ... .0. . . 0 .. .... . _ . .. - 14.0....0-‘00. ..I .0. . .. .... .o .0. 0.0“... . . 00. .n. . 6 ... .60.... 00.0 ..I 00 0 00.... .60 0 .\.. .0 .6. Q..‘ ..00 .00... 5‘” ”5.0 .. 10.... ..0. 6.1 ... .6 .0. . 0-0 6...0. . . 0.0-. . O. . . run“. I. L... ..0. .....0. . .0 ..0 ans... 0. .60. 10.20. 0600-3... 0.0.9160. 0.26.! ..u.0»-J0.00...-00100.6. . . . _. 000 0- X - .16.. .0.-.0391, .21”, 16. 0.0. ...-360.001.00.00 0.0!? O ... .0 .0. . .6 .6 6 .0 v . . 06000 '. 0 0.0.00 . . X0 - :0 . ..0 02.0. 0. 6 . t 0 ...0- | .- . .6. . . ... . .. . .. . . . (-91.111630. . . 66 0.0.10. .0.1.11 .9 .006...-.’. 0 -\‘)00.’90)I§1\‘|6000“§0§0. ......- 6- 0. .0.... .Q. ’0... 00.0.0000 .. | .0 . . . . . . . . . . . . . .. . 11...... 106 .0900. .....91 6.06.000? C. 003.311.912.917: 6 .‘010'3'6’000 6:69.01991‘31 .... 50...? .0 ... 0. ..0. 6.. . . . . . . . . . . . . . . . .. .. . . .. . . . . . ... . : .10....6)! . 2”... 6.0.3.1.]... I. ...v .210. .fi70601620M0-6‘J06'!|.‘01 .9300. 01.090. -. ...0 . . .0 . .6. v. . u . . . . . . . . .. . . , . . . . . . . . . . . . . . ....33-1-906: 01.696.01.300.)- .6.H 16.0 0 .0....nl900-019.‘ .0. . .. 3.00 . 000 . .. ..0 91 . . 0 0 0 ... . ...-1.6 - 0 .7. 0 ..66 61 . ...... . 9 || 10.00 000.0,... “.0.-.1030”. 0" (0.0.9191-00! . . . .6 n0 “.0..." 6 M3 0 ...-0. 0.0.... .06. 9 J... . ...10}M.L.».“.. ...». -. 0-... - 0. Q.. 0.. .0. . .. 0. 0.. .000. 020.16.0-‘6‘ N "who. lrfi1‘uouvi‘o 900. 0. .0 - 0 ... 6 00’ 9 . . . ' ”0-.- .. .3001... 6. ...-.09.. ...“.X. ... 6.70 .0 .. . ..0. . .3“... . . 6.... ..-. . . 0...? .. 6100060 06.14 10312001519090.0001. 901,665."..rah‘ I ‘3... .2000 1‘ 1.0. ... '0. .0 0 ..0 a .0 . 3.7.. 0. . 1.65 - .. .. ...... 0 ,0. 05-11-0000. . 0161961000.:0000’ ".900112!00¢0i92930 609.6- 161..- . .0.. .6 . .7 ....... p 106;. - .. . 6a.. .08.. 19.0.. ...-0. 1.06-1006...:6111\.\2.-.306 totittglfii . .. .. .. - .. 9 . ...-1.106. . ...- 701 0. 1.,0.0.0.\0h.- 66.0 .493. 619.0060|0000 1.0006019001996-060110190‘. — .0"... .anvfldn . . . . .9 . "H0... 0. 0.00. ..0..- .. .- ..0.Q£.0....~u.6-.0 0... . 0. 0-1.0 A. 0... -0 00.060. ...... . ..1. 9 0. 0.. . 0 .-. 0 0. ..100‘0w06- .‘60flkfl‘hflfi6‘gflu 0 IO-il’. . 1 0.16006IONL4 ‘0” O 00 - . 1 . .. - . 62669 .06 0 .0‘.0- 0100. 60. . . .. . . . v . 0‘- 00... 11010 01 1 . 0 0' 00 ‘ .1.... .. ... l.0.t. . . $.31... . ....10. -. ..0. ..6 . ... . . .... .. .. . . . . 3..... . ...0. .01.... - ..0; io- .. . . . . . ... o . .. ...-.... «3.. .0. . ,_ .. . . .. . .. ...? . 1. . .. . . . ... v.4... .. ...... ... 7- . .1. .910... .J.....o|..v.00 .... . 9 6.060100173009006. .l..0-0.000663‘ 1. 36%. 0.1 4.. 00C. 7. .. J... . . ... .. . 0-1.. .. - .. ...: . .60 . .. . .0. . . . . . .... \. .. . ...606 ....0. .. .0. . .6! 5-6 I. .90 -.. l . 0 . -..|. 0.. 0 ..”0.uu0....d 16.3”” .0.“.. 06:. n-.n.:....-00. 01. ... .0 0a.... .0.“. ....00 .6.— -. ..0 6. 01'."- .l.6.0.0.0.:00. . ..00 . ”.02.... J0 . .1 . .. , ..l. . .0... .0 .. . .1.- 0 {-06.6 0a 0. - . .. 6 . . . 66.. .. . .00 ... .. 0.. .0. .060 .. . I. . ...0.!.. ... . . . .6 . . . ...0. w.... ... ...v ....nml.400. 0.. 31.13.. 5? .15.. . .X. .... .60 00...}. 06 ...3. 0.. .0. . .. ...61 3..-... .06 6... . . . - 0.3016 .. x 1.- ..3! . ... 6 . .66.. . . . . .. ...!- 61. 0;. n... . 6. .0 ... . .. 0. 0.. ...? . . . . 0 ..0. . .00 .0. ..L . - . o ..9. . .0 - ... 20.6.1... . . . 0-90. . ..- ..0o-0.0 .. 0 . . . 1.0.... ..u $6.0 .. ... . . .. . 0... 0:0: 00.. .6. 0. .. . .v 6 i H . . .60...6000 .0..00- k. u: 0.000.. . ..00. . 00. 0-. . . 0.. .. ...»..0 .0 ..00. . .0Q .0 o. . .. ... 0. 0 .v .. ...: . ... ... .... ... up . p .0 ..L..‘60. 0. .1‘01 . . . . . . . 0....00. .. .10.. ....0... 66 0.016....- .0 . 00 . . I110 . . . . . . 1.0.. . 5.61:0...- .0. ..0. .. 69. ......6... 06“. .II.J. . . .. . .. J 0 .0011... . O . 7.0.. .. 0.. Q". 96 1.66. .3662... 6. . .. (..0...- . 00.0.5 0. ..10 00- 9- . ..00H0....0000 06.. 0 0 .. . 1. 9.. 6 0. . 076-6 60.... 001...... Z .... 090.. . . 6.... ... V 0 .. o ...9: . .‘6 . .. ..3-0...! . . . . -0.- . .. 0 64.0... . . . ..‘l0000 . . 00.0 .v - . .1 .09 o 6.. 0 .0 .. a. 06- 0. .lo! I 00. . r: ’- ... .. 6. ... . 6. V .0 .0... . c - ... 0. 6. 0O. 000 -\ u. .0 .0 t 0- . .. .6 6000. . L. '60. . ... 6 6 I 7-0. 66.0.00 0 .. 0 0 01.6.1 , ... to. .6...0... .-. 0 ...-...0 ,0 o . .0 . ..0. _ 1 :0 ..HI “ufi _ L “0.0 9 .J Q.6v “€010 01.0 ..0-M. .0.-00w 0.. .X‘A - ..0-6L . L.f‘.‘~. *Wfioku‘.” ’0'.uu..000-..0‘.-u|.&- &... .0 PW“? 0|! . . - 0h:- 0 . . - :o . 0.0 3 . . . . 0|»..0...0.....6_mu 1‘ .u .0 1‘ him-16"....u. ...“. %.h.0.l0 .0. “I I ._ $1. om..0~.\.0001 . Qua-ymfit-F 0.". . ...:w.-Qu.. L02» . AZWMTM. u.“ 3. .00.w.0l|..014«\m0n.0'00.5..£. l. .... . .01.». .. ... . r 4". ... 11.6.. Mal»: . ...... .. .... ...-10.10.343.19 6. 1.. 3:... 1...... w...- .... . #0.. 1.5%.... 0.... . 0. t . .0 . at: 0.. 0;“. fit‘“? 5...). Hum-... 3.06.00. ..0.. 0...... 001.031. ...-0 do. .u... ... “...... .10... . .0.-03.002090. .. ...! 3.10.1119 ... b. . .00. .. 6 ”6L .. . . 1:0.....0J:.0 .... ... 0.. .....6 H... 0hr ..h. 0PM:. .0..." . .001“ 0. A 00 0-... iv0....| ..0 .00.. ..th watt... 1...}. .0.. ..0" 0... . .. ....L: r...r._. 06.. .w_ 1.0.0.0100... 0.20.0.1..- . . .66 I ...-00.. . .7. 0.-0 ....Q . .. '1'6 0.0. n .. .0 This is to certify that the dissertation entitled BOOSTING AND ONLINE LEARNING FOR CLASSIFICATION AND RANKING presented by HAMED VALIZADEGAN has been accepted towards fulfillment of the requirements for the Ph.D degree in Computer Science 9 4 / Major Professor’s Signature 09/27 /2o\ 0. Date MSU is an Affinnative Action/Equal Opportunity Employer LIBRARY A“ Michigan State University__ ——.——u——, PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 K:IProj/Aoo&Pres/CIRC/DateDue.indd BOOSTING AND ONLINE LEARNING FOR CLASSIFICATION AND RANKING By Hamed Valizadegan A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Computer Science 2010 ABSTRACT BOOSTING AND ONLINE LEARNING FOR CLASSIFICATION AND RANKING By Hamed Valizadegan This dissertation utilizes boosting and online learning techniques to address several real-world problems in ranking and classification. Boosting is an optimization tool that works in the function space (as opposed to parameter space) and aims to find a model in batch mode. Typically, boosting iteratively constructs weak hypotheses with respect to different distributions over a fixed set of training instances and adds them to a final hypothesis. Online learning is the problem of learning a model when the instances are provided over trials. In each trial, a new sample is presented to the learner, the learner predicts its class label and then receives some feedback (partial or complete). The learner updates its model by utilizing the feedback and then a new trial starts. We consider several learning problems, including the usage of side information in rank- ing and classification, learning to rank by optimizing a well-known information retrieval measure called NDCG, and online classification with partial feedback. Using side information to improve the performance of learning techniques has been one research focus of machine learning community for the last decade. In this dissertation, we utilize the abundance of unlabeled instances to improve the performance of multi-class classification, and exploit the existence of a base ranker to improve the performance of learning to rank, both using the boosting technique. Direct optimization of information retrieval evaluation measures such as NDCG and MAP has received increasing attention in the recent years. It is a difficult task because these measures evaluate the retrieval performance based on the ranking list of documents induced by the ranking function, and therefore they are non-continuous and non-differentiable. To overcome this difficulty, we propose to optimize the expected value of NDCG and utilize boosting technique as the optimization tool. Online classification with partial feedback is recently introduced and has applications in contextual advertisement and recommender systems. We propose a general framework for this problem based on exploration vs. exploitation tradeoff technique and introduce effective approaches to automatically tune the exploration vs. exploitation tradeoff param- eter. © Copyright by HAMED VALIZADEGAN 2010 To my loving parents, Simin Rahimi and Reza Valizadegan, for their unlimited and unconditional encouragement, support, and love. ACKNOWLEDGMENTS During my Ph.D, I have received support from a number of people without whom the completion of this thesis was not possible. First of all, I would like to express my deepest gratitude to my thesis advisor, Dr. Rong Jin, for his unique supervision and guidance. He motivated me to work on a diverse set of problems in machine learning and provided me with an excellent mathematical and optimization knowledge support. Under his supervision, I have learned different aspects of conducting high-quality research and become capable of publishing papers in prestigious research venues such as NIPS and WW. For a number of years, I have also worked closely with Dr. Pang-Ning Tan with whom I published a few papers in data mining. I would like to present my sincere appreciation for his valuable support during those years. I will never forget his kindness and help. I would also like to thank my committee members, Dr. Anil K. Jain, Dr. Joyce Chai, and Dr. Selin Aviyente for their valuable feedback and discussions during my compressive and thesis exams. I want also thank the Department of Computer Science and Engineering at Michigan State University that provides me with the financial support in terms of teaching assistant for a number of semesters. I would like to particularly thank Dr. Abdol-hossein Esfahnian, Dr. Eric Tomg and Linda Moore for their amazing attitude in helping graduate students in the department. The contextual advertisement group of Yahoo! kindly provided me with an exceptional work atmosphere during Summer and Fall 2008. I would like to thank everyone in their group, particularly Dr. Jianchang Mao, the head of contextual and display advertisement science and Ruofei Zhang, my direct mentor. It has been a great pleasure to collaborate with Dr. Hang Li, the research manager of Information Retrieval and Mining Group at Microsoft Research Asia, and Dr. Shijun Wang vi from National Institute of Health with whom I co-authored research papers in ranking and online learning, respectively. Finally, I should thank the members of LINKS and PREP labs for all the great supports they have provided me with during my Ph.D. Particularly, I would like to thank Wei Tong, Fenhjie Li, Yang Zhou, Pavan Mallapragada, and Matthew Gerber. vii TABLE OF CONTENTS LIST OF TABLES xi LIST OF FIGURES xii 1 Introduction 1 1.1 Classification ................................... 2 1.2 Learning to Rank ................................. 3 1.2.1 Training set ................................... 5 l .2.2 Evaluation ................................... 6 1.2.3 Learning .................................... 6 1.3 Batch Learning .................................. 7 1.3.1 Boosting .................................... 8 1.4 Online Learning .................................. 11 1.5 Contribution of This Dissertation ......................... 13 1.6 Benchmark Data Sets ............................... 15 1.6.1 Classification Data Sets ............................. 15 1.6.2 Ranking Data Sets ............................... 16 2 Semi-Supervised Multi-Class Boosting 18 2.1 Introduction .................................... 19 2.2 Related Work ................................... 22 2.3 Multi-Class Semi-supervised Learning ...................... 23 2.3.1 Problem Definition ............................... 23 2.3.2 Assemble Algorithm .............................. 23 2.3.3 Design of Objective Function ......................... 25 2.3.4 Multi-Class Boosting Algorithm ........................ 27 2.4 Experiments .................................... 3 1 2.4.1 Experimental Setup ............................... 32 2.4.2 Evaluation of Classification Performance ................... 33 2.4.3 Sensitivity to the Combination Parameter C .................. 36 2.4.4 Sensitivity to Base Classifier .......................... 36 3 Optimizing NDCG Measure by Boosting 40 3. 1 Introduction .................................... 41 3.2 Related Work ................................... 43 3.3 Optimizing NDCG Measure ........................... 44 3.3.1 Notation ..................................... 44 3.3.2 AdaRank Algorithm .............................. 45 viii 3.3.3 A Probabilistic Framework ........................... 46 3.3.4 Objective Function ............................... 48 3.3.5 Algorithm .................................... 50 3.4 Experiments .................................... 54 3.4.1 Experimental setup ............................... 55 3.4.2 Results ..................................... 56 4 Ranking Refinement by Boosting 58 4.1 Introduction .................................... 58 4.2 Related Work ................................... 61 4.3 Ranking Refinement ............................... 62 4.3.1 Problem Definition ............................... 62 4.3.2 Encoding Ranking Information ......................... 63 4.3.3 Objective Function ............................... 64 4.3.4 Boosting Algorithm for Ranking Refinement ................. 69 4.4 Experiments .................................... 74 4.4.1 Experimental Setup ............................... 74 4.4.2 Results for Relevance Feedback ........................ 77 4.4.3 Effect of Base Ranker ............................. 78 4.4.4 Effect of Size of Feedback Data ........................ 79 4.4.5 Results for Recommender System ....................... 79 4.4.6 Time Efficiency of Ranking Refinement .................... 80 5 Online Classification with Bandit Feedback 85 5. 1 Introduction .................................... 86 5.2 Related Work ................................... 87 5.3 A Potential-based Framework for Classification with Partial Feedback ..... 88 5.3.1 Problem Definition ............................... 88 5.3.2 Banditron .................................... 90 5.3.3 Potential-based Online Classification for Partial Feedback .......... 90 5.3.4 Exponential Gradient for Online Classification with Partial Feedback . . . . 95 5.4 Experiments .................................... 97 5.4.1 Experimental results .............................. 100 6 Robust Online Classification With Bandit Feedback 102 6. 1 Introduction .................................... 102 6.2 Related Work ................................... 105 6.3 Balancing between Exploration and Exploitation ................ 106 6.3.1 Preliminary ................................... 106 6.3.2 Finding Optimal 7 using [’y‘t aé gt] 3 rt and [9} = gt] 5 pt .......... 108 6.3.3 Finding Optimal 7 using [37¢ aé gt] 3 1 and {Q} = yt] _<_ pt .......... 110 6.3.4 Finding Optimal 7 using [3} 76 gt] 3 rt and {1;} = gt] 3 1 ........... 111 6.4 Experiments .................................... l 12 6.4.1 Experimental Settings ............................. 112 6.4.2 Experimental results .............................. 1 l3 ix 7 Conclusion and Future Work 116 7.1 Summary and Conclusions ............................ 116 7.1.1 Boosting .................................... 116 7.1.2 Online Learning ................................ 118 7.2 Future Work .................................... 119 7.2.1 Boosting ........... ' ......................... 119 7.2.2 Online learning ................................. 120 APPENDICES 122 A APPENDIX 123 A] Proof of Lemma 1, Chapter 2 ........................... 123 A2 Proof of Lemma 2, Chapter 2 ........................... 124 A3 Proof of Theorem 4, Chapter 2 .......................... 125 A4 Proof of Proposition 2, Chapter 3 ......................... 126 A5 Proof of Lemma 4, Chapter 3 ........................... 126 A6 Proof of Theorem 5, Chapter 3 .......................... 127 A.7 Proof of Theorem 6, Chapter 3 .......................... 127 A8 Proof of Theorem 7, Chapter 3 .......................... 128 A9 Proof of Theorem 8, Chapter 4 .......................... 130 A.10 Proof of Lemma 5, Chapter 4 ........................... 131 All Proof of Theorem 9, Chapter 4 .......................... 131 A.12 Proof of Theorem 10, Chapter 4 ......................... 132 A.13 Proof of Proposition 5, Chapter 5 ......................... 133 A.14 Proof of Theorem 11, Chapter 5 ......................... 133 A.15 Proof of Lemma 7, Chapter 5 ........................... 135 A.16 Proof of Theorem 14, Chapter 6 ......................... 135 A.17 Proof of Proposition 6, Chapter 6 ......................... 136 A.18 Proof of Proposition 7, Chapter 6 ......................... 136 A.19 Proof of Proposition 8, Chapter 6 ......................... 137 BIBLIOGRAPHY 138 LIST OF TABLES 1.1 Description of the classification data sets used in this dissertation ...... 16 1.2 Description of data sets in Letor 3.0 ...................... 17 xi 2.1 2.2 2.3 3.1 4.1 4.2 4.3 4.4 4.5 4.6 5.1 5.2 6.1 6.2 LIST OF FIGURES Performance comparision ........................... 35 Sensitivity to parameter 0 .......................... 37 Sensitivity to the base ranker ......................... 39 The experimental results in terms of NDCG for Letor 3.0 data sets ..... 57 Reduction of the objective function Lp using the OHSUMED Data Set . . . 71 NDCG of relevance feedback for different algorithms ............ 81 NDCG of MRR with different base rankers for relevance feedback ..... 82 NDCG of MR with different numbers of feedback ............. 83 The ranking result for recommender system ................. 84 Running time of MR for different numbers of movies ........... 84 Performance comparisons of different methods ............... 98 Performance comparisons of different methods with varied 7 ........ 99 The error rates of Banditron with different choice of 7 ............ 104 The error rates of different methods over trials ................ 114 xii Chapter 1 Introduction Learning is the task of constructing a prediction model using training data. A learning task is defined by an objective function that evaluates the performance of each model in the do- main. A variety of objective functions for learning are defined for different learning tasks. These learning tasks differ in I) their type of prediction, H) the type of feedback/labeling for training data, and III) the way training data are presented to them. Based on the type of prediction, the learning algorithms can be classified into three major groups: classification, regression, and learning to rank. A regression model aims to map an instance to a numerical value. A classification model (classifier) categorizes instances into predefined classes and a ranking model (ranker) orders a series of items based on a given request. Training instances can be presented to the learner in two different ways: batch mode and online mode. In batch mode, a set of training instances are provided to the learner and the learner trains a model off-line. The learned model is evaluated based on the prediction made for unseen test instances. We usually assume the training instances are i.i.d samples from an unknown distribution and the objective is to learn a statistical model that is able to make accurate prediction for unseen instances sampled from the same distribution of the training data. In online mode, the task of learning and making prediction are performed at the same time; i.e. the learner applies the current model to each received instance, and then receives the feedback for that instance and consequently updates the model based on the instance and the feedback. In online mode, we do not have to make the i.i.d assumption regarding the received instances and the data generator produces instances arbitrarily [l]. The feedback for the training instances can be either partial or full in online mode and the label for training instances can be either present or absent in batch mode. Each of these combination results in different learning tasks. When we discuss batch learning in more details in section 1.3, we cover a brief description of semi-supervised learning, in which part of training instances are unlabeled; we discuss online learning with partial feedback in Section 1.4 where the feedback only indicates if the predicted class is correct. In the following sections, we focus on classification, learning to rank, batch and online learning to draw the direction of materials in the future chapters of this thesis. 1.1 Classification Classification is the task of categorizing instances into predefined classes and has found countless number of applications. In the fully supervised mode, the learning algorithm re- ceives a set of labeled instances, each represented by a vector of features and a label that shows its class assignment. The objective of the learning algorithm is to learn a classifier that is able to make accurate prediction for unseen examples, generated by the same distri- bution for training instances. The ability of a learner in producing models that perform well for unseen instances is called generalization ability [2] in the machine learning literature. Many effective algorithms have been proposed for the task of supervised classification, such as Support Vector Machines (SVMS) [3], logistic regression [2], and boosting [4]. Classification is one of the oldest machine learning tasks. Nonetheless, it still finds applications that demands developing new techniques. One of the major challenges we address in this dissertation is to learn a classification model from partial feedbacks. As an example, consider the problem of contextual advertisement that chooses advertisements to display on a web page for a specific user [5]. The contextual advertisement algorithms are usually based on this assumption that users provide feedback by clicking on relevant advertisements [5]. However, if none of the displayed advertisements are relevant to the user’s information needs, they will not be clicked and consequently the algorithm does not know which advertisements are relevant for the user. We refer to this scenario as partial feedback as opposed to the case of full feedback where the correct output (i.e., the relevant advertisement) is provided for each instance. This task demands new online learning algo- rithms that are able to learn over the trials in the partial feedback setting. In particular, the online algorithms need to explore the exploration vs. exploitation trade-off techniques that are primarily developed for multi-armed bandit problem [6]. The performance of a classification algorithm is usually evaluated by the classification accuracy. For the evaluation of multi-class or multi-label learning, the classification ac- curacy may not be sufficient, particularly when the number of classes is large or classes are unbalanced. In those cases, the most commonly measures used for classification are precision, recall or a combination of these two, such as F1 measure and ROC curve. 1.2 Learning to Rank Ranking is the task of ordering a list of offerings for a given request. It receives a set of offerings and a request as input and outputs the list of offerings sorted according to their relevancy to the request. The performance of a ranking algorithm is evaluated based on how well it sorts the offering according to their relevancy to the request. Learning to rank is the task of learning a ranking function that can order the offerings for unseen requests. It receives a set of requests, each with a sorted list of offerings as the training set and produces a ranking function to sort offerings for new requests. Learning to rank is a relatively new area of study in machine learning that has received much attention in recent years because of its important role in a variety of applications including: 0 Document Retrieval: In document retrieval, the request is a textual query (a set of keywords) and the offerings are documents. Users provide a set of keywords to the system and the ranking system should retrieve the most relevant documents to those keywords. o Recommender Systems: In recommender systems, the request is a user and the offerings are the items to be recommended. For example, in movie recommenda- tion system, a ranking system aims to recommend the most interesting movies to a particular user based on the history of users and movies information. o Sentiment Analysis: In sentiment analysis, the request is a text and the offerings are the attitudes of the author regarding to a particular subject. 9 Computational Biology: In computational biology, a request is a protein and the offerings are the list of different 3d structures. The objective is to provide a sorted list of 3d structures for a given protein. 0 Online Advertisement Placement: In online advertisement placement, the request is a user visiting a web page and the offerings are the advertisements. Online adver- tisement systems should rank the relevancy of different advertisements to that user and display the most relevant advertisement on the web page in order to maximize the number of clicks on the advertisements. Throughout this thesis, we use the document retrieval terminology (e.g. query for request, document for offering) when talking about ranking although the material are applicable to other domains. Since learning to rank is a relatively new problem, we describe it in more details here. A learning to rank system usually consists of three components that distinguish it from classification and regression. 1.2.1 Training set The training set for learning to rank consists of a set of queries. For each query, a list of documents and their relevancy to the query are provided. The common practice in learning to rank is to assume the existence of a set of base rankers that can be considered the feature generators for query-document pairs. PageRank [7], vector space model [8], and statistical language models [9] such as BM25 are some example base rankers. These base tankers are basically unsupervised models that measure the relevancy of each document to a query. The value produced by each base ranker is considered a feature for a query-document pair and the learning to rank algorithm aims to combine these feature values to produce a ranking function. The label information in learning to rank is in form of relevancy judgments that can be of three different types: relevancy scores, pairwise relevancy information (partial ordering) and a complete ordering. A relevancy score is a numerical value (e.g. 1,2,..) that shows the level of relevancy of documents to a given query [10]. Relevancy scores are the most widely used relevancy information. The pairwise relevancy information is the relative relevancy between two documents that indicates which document among the two is more relevant. The pairwise relevancy can often be derived from the implicit feedbacks from users. For example, in search engines, when a user clicks on one of the ranked documents, it is safe to infer that the clicked document is more relevant than the documents that are ranked before the clicked one. This type of click-through feedback provides the relative relevancy for pairs of documents [11]. A less commonly used relevancy information is a complete relevancy ordering of documents to a given query [12] in which documents are ordered in the descending relevancy. Notice that the relevancy scores can be converted to a pairwise ordering and complete ordering but the Opposite is not true. 1.2.2 Evaluation The performance of a ranking system is evaluated based on how well it predicts the rele- vancy of documents to a query. Several evaluation measures are introduced in the literature. Area under the ROC Curve (AUC), Mean Average Precision (MAP), and Normalized Dis- counted Cumulative Gain (NDCG) are some of the most-widely used measures. AUC is based on the Wilcoxon test, a nonparametric statistical test to measure the distributional difference between two sets of numbers. AUC works only for two levels of relevancy judg- ments and measures how well a ranking function places the relevant documents on the top of the irrelevant documents. AUC treats documents similarly regardless of their position in the ordered list. However, the top retrieved documents are more important because users only look for the relevant documents at the top of the list (e.g. consider a search engine in which users only look at the first few pages of retrieved links). Based on this observation, MAP [13] and NDCG [14] are constructed to put more weight on the documents at the top of the list. Similar to AUC, MAP only works for binary relevancy judgment. On the other hand, NDCG is a general evaluation measure that can handle ranking problems with multiple levels of relevancy judgements. 1.2.3 Learning Three types of learning to rank algorithms can be found in the literature: Pointwise, pair- wise and listwise approaches. Pointwise approaches [15-17] can be applied when the relevancy scores of documents are available. In this case, the relevancy scores are consid- ered as absolute quantities and a classification or regression technique is applied by treating the relevancy scores as class labels or numerical values. The pairwise approaches are the only group of techniques that can handle the pairwise relevancy information. They ap- ply a classification or regression technique to learn the ordering information of pairs of documents [18-23]. The third group of algorithms, the listwise approaches, are the most effective learning to rank techniques that have been studied in the last few years. They are motivated by this observation that most evaluation metrics of information retrieval measure the ranking quality for individual queries, not documents. These approaches consider the ranking list of documents for every query as a training instance [13, 24—29] by optimizing a listwise loss function. We describe these techniques in more details in Chapter 3. 1.3 Batch Learning In batch learning, a set of training instances are provided that are generated by an unknown distribution. The goal is to train a model off-line that is capable of making accurate predic- tion for unseen instances. As mentioned before, dependent on the type of training instances and their labels, different learning tasks can be defined. For example, in classification, each instance is a vector of features and the label is the class assignment. And in the listwise approach to learning to rank, each instance consists of a query, the list of its documents, and the relevancy of documents to the query. Training instances can be either all labeled or partially labeled that results in two dif- ferent modes of learning: supervised and semi-supervised learning. All training instances are labeled in supervised learning and plenty of unlabeled instances are provided in case of semi-supervised learning to help the process of learning. The usage of unlabeled instances are based on some assumptions about the data generating process such as manifold and cluster assumption [30—35]. We return to these assumptions in Chapter 2. In most studies of batch learning, an objective function is designed to measure the performance of a given model (function) on instances. Different learning algorithms can be designed by defining different objective function for the same task. For example, in case of classification, the negative log-likelihood function is used in logistic regression, a hinge loss leads to support vector machines, and etc. In case of learning to rank, the pointwise approaches utilize a classification or regression model, i.e. they utilize a classification or regression loss function. Similarly, pairwise approaches are concluded from designing a classification or regression model on pair of documents and a listwise learning to rank algorithm results from utilizing a loss function in the level of query. Given an objective function (loss function) L(F) to measure the performance of a given model F, learning translates to the process of finding F that optimizes L(F). A common approach is to restrict the model to a member of a parametric family F (w) (e.g. a linear model). This constraint translates the objective function L(F) into an objective function of parameters 212, i.e. L(w), and consequentially the optimal model is found by optimizing the objective function with respect to w. In this case, L(w) is called a function in the param- eter space. A different approach is to directly optimize L over function F. This approach optimizes the objective function in the function space and is called boosting. Boosting is the optimization technique we utilize in this thesis for the batch mode algorithms we cover. 1.3.1 Boosting Boosting [4, 36] is a popular technique with a greedy nature designed to optimize a given objective function in the Space of functions. This is very important because it allows to boost the performance of any base function (weak learner) once the problem is written in the function space. Boosting can be considered as a gradient descent algorithm applied in the function space [37]; in each step i, it learns a new direction ft and a step size at to move as much as possible toward the optimum point, which results in a final solution of Fn = 2L1 aifi. Instead of applying a direct optimization approach such as gradient descent, bound Optimization strategies [38] may be used; this is because f,- and a,- are dependent on each other and it is difficult to decide the values for fz- and O, simultaneously. The bound optimization strategy is often applied to decouple the dependency between f,- and ai. We use this technique in different parts of this thesis. First introduced by Schapire [4], boosting was initially designed to convert a weak learner that performs just slightly better than random guessing into an accurate classifier. Here, by random guessing, we mean a classifier with less than 50% classification error. However, as we will Show throughout this dissertation, the meaning of random guessing can change from one problem to another. In this view, given a set of labeled training exam- ples (xi, 3],), i = 1..n, a boosting algorithm provides the weak learner with a set of weighted training examples at each round. The weak learner constructs a model by optimizing its loss over the weighted training examples. In the new iteration, the boosting algorithm pro- duces a new set of weighted examples by increasing the weights for the examples that are misclassified in the previous round. The iterations are repeated till the algorithm converges. One well-known boosting algorithm is AdaBoost [39], developed based on an expo- nential loss function for classification. Algorithm 1 shows AdaBoost algorithm. At the beginning of this algorithm, the booster chooses a uniform weighting over the examples (Step 3). Given the weights produced by the booster, the weak learner constructs a bi- nary classifier that minimizes the loss Ct at Step 5. The booster then produces a new set of weights for the examples in Step 8 by increasing the weights for the examples misclassified in the previous round of learning (Steps 6 and 7). These steps are repeated for a number of times. We have the following bound for the misclassification error of the final hypothesis generated by AdaBoost algorithms: 6 g 2TH;1 et(1— at) (1.1) where 6t is the classification error for the hypothesis generated in round t. The above result shows that, under the assumption of weak classifier, the classification error is guaranteed to be reduced as the iteration proceeds. Using the Minimax theorem, Freund et a1. [39] showed that there is a mixed strategy over the space of hypotheses H that produces zero classification error over the training set if (H, X) is 7-learnable. The progress of a boosting algorithm is measured by how much the classification error (or a given loss) decreases at each iteration (or over time) and For 7 > 0, a learning algorithm is 7-leamable if for any distribution Q over training examples X, the algorithm can return h E H with at most % — 7 classification error Algorithm 1 AdaBoost Algorithm 1: Input 1. A weak learner, and a set of training examples 2. A set of training examples (x1,y1),...,(:rm,ym) where 1:,- E X and y,- 6 {—1,1}. 2: Initialize F(:r,-) = 0,2' = 1, .., m 3: Initialize 01(2) = 1/m,z‘ = 1, ...,m 4: repeat 5: Find the classifier ft : X —> {—1,1} that minimizes 6t = 2&1 Dt(i)I(y,- 74 ft($i)) 6: Compute at = %ln(l—:t—€t) 7 Compute F(a:,-) = F(:c,') + aft(:r,-), i: 1, ..,m (2.) = Dt(i)exp(20tyift($i)) 8: Compute the new weighting Dt+1 t malization factor. 9: until reach the maximum number of iterations where Zt is the nor- defined in the following form: M(Pt1 Q0) S HtT=16(M(htr Qt)) (12) where 6 is an increasing function of the loss, M (Pt, Q0) is the suffered loss when the majority vote Pt is used over H and Q0 is the uniform distribution over X (i.e. M (Pt, Q0) is the computed loss of weighted majority vote over the original samples), and M (ht, Qt) is the computed loss at round t (i.e. the loss suffered when a single hypothesis ht is applied over the weighted samples set Qt). Beside classification and regression, boosting has been applied to a wide range of ap- plications including: o Semi-Supervised Learning: Boosting can be utilized to adapt a supervised learner to the problem of semi-supervised learning. For example, [40] used binary classifier as the weak learner and boosted it for the task of semi-supervised classification and [41] exploited a binary supervised learner as the weak learner and boosted it for semi-supervised clustering. 10 0 Learning to Rank: Boosting is used to learn a ranking function to order the rel- evancy of documents for a query. RankBoost [19] and AdaRank [42] are example applications of boosting to ranking. RankBoost uses pairwise binary classifier and boost it for ranking and AdaRank adapts AdaBoost to optimize information retrieval evaluation measures such as Normalized Discounted Cumulative Discount (NDCG) and Mean Average Precision (MAP). 1.4 Online Learning Online learning is the task of learning when the examples are provided sequentially (over the trials). In each trial, the learning algorithm receives a new example, classifies it and then acquires some sort of feedback. Using this feedback, the online learning algorithm updates the model in order to better classify the future examples. The feedback provided to the online algorithm can be either full or partial. In the full feedback setting, after classifying an instance, the algorithm receives its true class label. One well-known example of such online learning algorithm is the well-known Perceptron algorithm [43]. In the partial feedback or "Bandit" setting, the true label is not revealed and the feedback is limited to whether or not the algorithm classified the instance correctly. Since the difference between full and partial feedback in the above discussion only makes sense for the case of multi-class classification, the online classification with partial feedback is called multi-class bandit learning [5]. The objective of the learner is to generate a sequence of hypotheses that guarantees a small cumulative loss in the long run when compared to the best hypothesis in the space of hypothesis; i.e. T T 1 1 . 5: 2:1: M (Pt. Qt) S ? mgn t§=1:M(P.Qt) + 5(T) (1-3) 11 where 6 is a decreasing function of T and should approaches zero when T approaches infinity. The bandit feedback has several real-world applications such as online advertise- ment [5] and recommender systems [5], as described in the following 0 Online Advertisement: In online advertisement, we often assume that a sponsored ad is likely to be relevant to the user’s query if it is clicked by the user, and irrelevant otherwise. In the case when the sponsored ad does not receive a click, the online advertisement algorithm is unable to locate the advertisements that are relevant to the given query, leading to the partial user feedback. 0 Recommender Systems: A recommender system recommends some items (e.g. movies) to the user. The assumption is that if one of the recommended movies are selected by user, that movie was a correct recommendation. However, if none of the recommended movies are chosen by the user, the recommender system is not able to discover the right set of movies for that user. While the problem of online classification with full feedback is well-studied, online clas- sification with bandit feedback has received attention only recently [5]. Kakade et a1. [5] introduced Banditron as an extension to Perceptron [43] to handle the partial feedback set- ting. Online learning with bandit feedback can be regarded as the problem of multi-armed bandit [44] when some side information (e.g. the feature vector of instances) is available. Multi-armed bandit is the generalized version of one-armed bandit game (a traditional slot machine) in which several levers are provided and the player aims to choose a lever that maximizes the rewards in the long run. At each stage, the player only knows the reward for the lever he chooses; the rewards for the remaining levers are unknown to the player. In a more abstract level, multi-armed bandit problem refers to the problem of choosing an action from a list of actions to maximize rewards given that the feedback is (bandit) partial. The algorithms developed for this problem usually utilize the exploitation vs. exploitation 12 tradeoff strategy to handle the challenge arising from partial feedback [45—47]. 1.5 Contribution of This Dissertation We address several important ranking and classification problems in this dissertation. Uti- lizing side information in ranking and multi-class classification, direct optimization of in- formation retrieval measures such as NDCG, and online learning in the bandit setting are the subjects we cover, as summarized here: 0 Semi-supervised Classification: The focus of semi-supervised classification is on constructing better models by utilizing unlabeled instances when the number of la- beled instances is small. Several semi-supervised classification algorithms are devel- oped based on manifold [32—35] and cluster [30, 48, 49] assumptions. Most of these techniques work for binary problems and converting techniques such as one-versus- one and one-versus-the-rest are applied to use them for multi-class problems [50]. This converting procedures has several well-known problems including imbalanced classification and different output scales of different binary classifiers. We utilize both manifold and cluster assumptions in Chapter 2 and design an objective function that directly addresses multi-class semi-supervised problem. We solve this objec- tive function in the function space using boosting technique. Our empirical Study Shows the superior performance of this boosting algorithm compared to the existing boosting algorithms for multi-class problems. 0 Ranking by optimizing NDCG: The objective in this problem is to learn a ranking function by maximizing Normalized Discounted Cumulative Gain (NDCG), the most frequently used information retrieval evaluation measure for ranking problems with multi level relevance judgement [10]. This is a difficult problem because NDCG is a non-differentiable and non-continuous loss function. In order to overcome this difficulty, we introduce the expected value of NDCG and solve it in the function space 13 using the boosting technique. The detailed discussion of this boosting algorithm is provided in Chapter 3. Ranking Refinement: In some real world applications, there are two complementary sources of information for ranking, ranking information given by the existing ranking function (i.e., the base ranker) and that obtained from users feedback. One example of such applications is relevance feedback, where the two sources of information are the relevance scores obtained from a ranking function like BM25 [51] and the relevance judgments obtained by the users. The key challenge in combining the two sources of information arises from the fact that the ranking information presented by the base ranker tends to be imperfect and the ranking information obtained from users’ feedbacks tends to be noisy. We encode these sources of relevancy information in form of pairwise relevancy and design an objective function to combine them. We also design a boosting algorithm to solve the resulting objective function. The detailed discussion is provided in Chapter 4 where we perform extensive experiments to Show the superiority of our proposed framework to several baselines. Online Multi-class Learning with Partial Feedback: Unlike online learning with complete feedback that has been extensively studied [52], the problem of online multi-class learning with bandit feedback was introduced very recently [5]. Ban- ditron, the first introduced algorithm for multi-class learning with bandit feedback, is a direct generalization of Perceptron to the case of partial feedback that uses ex- ploration vs. exploitation tradeoff strategy to handle partial feedback [5]. Using potential function and exploration vs. exploitation tradeoff technique, we develop a general framework in Chapter 5, of which Banditron is a special case. The major problem with Banditron is that its performance could be sensitive to the parameter that trades off between exploration and exploitation [53]. We develop an effective approach in Chapter 6 to reduce this dependency. l4 1.6 Benchmark Data Sets Throughout this dissertation, we use two sets of data to study the performance of the pro- posed methods, one set for multi-class classification and one set for learning to rank, as described in the following subsections. We use 5 folds cross validation to run all the exper- iments except for online learning. 1.6.1 Classification Data Sets Multiple benchmark data sets from UCI data repository [54] and LIBSVM web page [55] are used in our study. Here is the list and a brief description of these data sets: MNIST. MNIST is comprised of grey scale images of size 28 x 28 for hand written digits. It contains 60000 training samples, each represented by 780 features. Protein. Protein has 17766 samples, represented by 357 features and three classes. Letter. Letter contains 15000 instances of 26 characters, each represented by 16 features. optdigits. This data set consist of normalized bitmaps for handwritten digits from 30 people. It contains 3823 instances, each represented by 64 features. pendigits. This is another collection of images for handwritten digits. It contains 7495 samples, each represented by 16 features. Nursery. Originally developed to rank applications for nursery school, it has 12960 records, each represented by 8 features belonging to one of 4 classes (we removed one class that only had two samples). Isolet. Isolet contains 7797 Spoken alphabet that belong to 26 classes, with each letter forming its own class. Every spoken alphabet is represented by 617 attributes. Notice that for some of these data sets, there were two separate sets, one for training and one for testing. We only used the training set in our experiments. The information related to these data sets are summarized in Table 1.1. 15 Table 1.1: Description of the classification data sets used in this dissertation Instances Features Classes Isolet 7797 617 26 MNIST 60000 784 10 Protein 17766 357 3 Optdigits 3823 64 26 Nursery 12960 8 3 Letter 15000 16 26 Pendigits 7495 16 26 1.6.2 Ranking Data Sets We use data sets from information retrieval and recommender systems to study the per- formance of ranking algorithm in our studies. For information retrieval, we use ver- sion 3.0 of LETOR package provided by Microsoft Research Asia [56]. LETOR Pack- age includes several benchmark data sets for ranking, along with the state-of-the-art algo- rithms for learning to rank and tools for evaluation. There are seven data sets provided in the LETOR package: OHSUMED, Top Distillation 2003 (TD2003), Top Distillation 2004 (TD2004), Homepage Finding 2003 (HP2003), Homepage Finding 2003 (HP2003), Named Page Finding 2003 (NP2003) and Named Page Finding 2004 (NP2004). There are 106 queries in the OSHUMED data sets, with each query equipped with around 1000 man- ually judged documents. The relevancy of each document in OHSUMED data set is scored in three levels: 0 (irrelevant), 1 (possibly) or 2 (definitely). The total number of query- document relevancy judgments provided in OHSUMED data set is 16140 and there are 45 features used to represent each document-query pair . For TD2003, TD2004, HP2003, HP2004, NP2003, and NP2004 there are 50, 75, 150, 75 150 and 75 queries, respec- tively, with about 1000 retrieved documents that are manually judged for each query. This amounts to a total number of 49058, 74170, 147606, 148657 and 73834 query-document pairs for TD2003, TD2004, HP2003, HP2004 and NP2003 respectively. For these data Unlike the classical supervised learning, in Ieaming to rank,the representation of documents depends on the given query. Hence, features are extracted for each document-query pair, not just for individual documents 16 Table 1.2: Description of data sets in Letor 3.0. Query document pair Queries Relevancy level Features OHSUMED 16140 106 3 45 TD2003 49058 50 binary 63 TD2004 74170 75 binary 63 HP2003 147606 150 binary 63 HP2004 74409 75 binary 63 NP2003 148657 150 binary 63 NP2004 73834 75 binary 63 sets, there are 63 features extracted for every query-document pair. A binary relevancy judgment is provided for every query-document pair. This information is summarized in Table 1.2. For every data sets in LETOR, five partitions are provided to conduct the five-fold cross validation, and each partition is further divided into the training set, testing set, and vali- dation set. The retrieval results for a number of state-of—the-art learning to rank algorithms are also provided in the LETOR package. We will describe these algorithms in details in Chapter 3. In order to evaluate the performance of the proposed ranking algorithms for Recom- mender System, we use the MovieLens dataset, available at [57], which is one of the most popular data sets for the evaluation of information filtering. It contains 100, 000 ratings ranging from 1 to 5, with l as the best rating and 5 as the worst rating for 1682 movies given by 943 users. Each movie is represented by 51 binary features: 19 features are de- rived from the genres of movies and the rest 32 features are derived from the keywords that are used to describe the content of movies. To extract the content features, we down- loaded the keywords of each movie from the online movie database IMBD and selected the keywords that are mostly used by the 1682 movies. 17 Chapter 2 Semi-Supervised Multi-Class Boosting Most semi-supervised Ieaming algorithms are designed for binary classification. They are extended to multi-class classification by approaches such as one-against-the-rest. The main shortcoming of these approaches is that they are unable to exploit the fact that each exam- ple is only assigned to one class in the case of multi-class Ieaming. Additional problems with extending semi-supervised binary classifiers to multi-class classification include im- balanced classification and different output scales of different binary classifiers. Given that there are well-known multi-class classification techniques such as decision tree and multi- layer perceptron, the research question is whether it is possible to use these techniques as weak learner and boost their performance for the task of semi-supervised Ieaming. The main challenge in designing such boosting algorithms is that the definition of the loss for unlabeled exampels is not clear. One approach is to generalize the notion of margin for labeled instances to unlabeled instances. This approach computes the margin for unlabeled examples by considering their assigned labels at the current iteration of the algorithm. However, Since the labels computed in the early iterations is likely to be inaccurae, this strategy produces undesireable results. Unlike the exising boosting algorithms for semi-supervised Ieaming which are only based on the classification confidence (margin) of the exampels (i.e. cluster assumption), 18 we utilize both the classification confidence and the similarity among examples (i.e. the manifold assumptions) to design a loss function for multi—class semi-supervised learning. We further develop a boosting algorithm for efficient computation. Empirical study with the multiple benchmark datasets shows that the proposed MCSSB algorithm performs better than the state-of-the-art boosting algorithms for semi-supervised learning. 2.1 Introduction Semi-supervised classification combines the hidden structural information in the unlabeled examples with the explicit classification information of labeled examples to improve the classification performance. Many semi-supervised Ieaming algorithms have been studied in the literature. Examples are density based methods [30, 31], graph-based algorithms [32— 35], and boosting techniques [40, 48, 49]. Most of these methods are based on either manifold assumption [32—35] or cluster assumption [30, 48, 49]. Under the manifold as- sumption, the data is assumed to reside on a low dimensional manifold within the original high dimensional space and the class assignment of unlabeled examples can be derived from a classification function that lives in this low dimensional manifold. Under the cluster assumption, the examples of the same class tends to be closer to each other than those of different classes. As a result of this assumption, the decision boundary is expected to pass through the low density regions. Thus, a given semi-supervised Ieaming is usually speci- fied by a combination of two terms, with one term related to the classification error on the training examples and the other term related to how well the model satisfies the assumption (either manifold or cluster assumption). While most of semi-supervised classification approaches were originally designed for two class problems, many real-world applications, such as speech recognition and object recognition, require multi-class categorization. To adopt a binary (semi-supervised) Ieam- ing algorithm to problems with more than two classes, a common practice is to divide a 19 multi-class learning problem into a number of independent binary classification problems using techniques such as one-versus—the-rest, one-versuS-one, and error-correcting output coding [58]. The main shortcoming with these approaches is that the resulting binary clas- sification problems are independent. As a result, these approaches are unable to exploit the fact that each example can only be assigned to one class. This issue was already pointed out in the study of multi-class boosting [59]. In addition, since every binary classifier is trained independently, their Outputs may be on different scales, making it difficult to iden- tify the most likely class assignment based on the classification scores [60]. Though cali- bration techniques [61] can be used to alleviate this problem in supervised classification, it is rarely used in semi-supervised Ieaming due to the small number of labeled training ex- amples. Moreover, techniques like one-versus-the-rest, where the examples of one class are considered against the examples of all the other classes, could lead to the imbalanced clas- sification problem. Although a number of techniques have been proposed for supervised learning in multi-class problems [59, 62, 63], none of them addressed semi-supervised multi-class learning problems, which is the focus of this chapter. Given that the supervised multi-class classification is a well-studied subject, an im- portant research question is whether it is possible to develop a general semi-supervised framework that is able to improve the accuracy of a given supervised multi-class Ieaming algorithm by effectively exploring the abundance of unlabeled data. The immediate answer to this question is boosting technique. The objective of semi-supervised classification is to learn a hypothesis that makes minimum number of misclassification on the labeled exam- ples and utilizes the unlabeled data for a better generalization. Given a loss function for the labeled and unlabeled examples, a boosting algorithms can be defined by reweighting each instance based on the current value of the loss. One straightforward approach to define the loss for unlabeled examples is to consider the classification confidence as the loss for unlabeled instances. The difficulty comes from the fact that the classification confidence related to the unlabeled examples are unknown. 20 One approach to address this problem is to use the class labels predicted by the current model as the pseudo-labels for the unlabeled examples and utilize them to obtain the clas- sification confidence (or margin). Assemble [48], as described in Section 2.3.2, is con- strucuted based on the idea of pseodu-labels. The problem with utilizing pseudo-labels to compute the loss for unlabeled examples is that the pseudo-labels assigned in the early steps of the algorithm is not precise and can lead to undesireable result of the boosting algorithm. Particularly, this approach does not directly utilize the underlying properies of data described as a manifold or cluster assumption. Moreover, since all the existing semi- supervised boosting algorithms are designed for binary classification, they will still suffer from the aforementioned problems when applied to multi-class problems. To avoid the above problems, we design a boosting algorithm in this chapter by con- sidering a multi-class loss function that utilizes both the manifold and cluster assumption; i.e. it consists of two terms, one releated to the consistency of the predicted labels and similarity between the examples, and one related to the consistency between the predicted labels and the true labels of labeled examples. To minimize this loss function, we develop a semi-supervised boosting framework, termed Multi-Class Semi-Supervised Boosting (MC- SSB), that is designed for multi-class semi-supervised learning problems. By directly solv- ing a multi-class problem, we avoid the problems that arise when converting a multi-class classification problem into a number of binary ones. Moreover, unlike the existing senti- supervised boosting methods that only assign pseudo-labels to the unlabeled examples with high classification confidence, the proposed framework decides the pseudo labels for un- labeled examples based on both the classification confidence and the similarities among examples. It therefore effectively explores both the manifold assumption and the cluster- ing assumption for semi-supervised learning. Empirical study with UCI datasets shows the proposed algorithm performs better than the state-of—the-art algorithms for semi-supervised learning. 21 2.2 Related Work Most semi-supervised Ieaming algorithms can be classified into three categories: density based methods [30, 31], graph-based algorithms [32—35], and boosting techniques [40, 48, 49]. As mentined in Section 2.1, these methods are based on either cluster or manfold assumption, dependent on how they utilize the unlabeled examples. Denisty-based meth- ods are usually based on finding a decision boundary that passes through sparse regions and have the maximum margin to both labeled and unlabeled examples [30, 31, 48, 49]. Cluster-based learners utilize a similarity measure between examples and construct a graph to propagate the labeling information to the unlabeled instances [32—35]. Semi-supervised learning algorithms can be also categorized into inductive and trans- ductive learner based on their functionality. A semi-supervised learner is called trans- ductive if it does not produce a classifier and cannnot operate on the unseen exampels. Otherwise, it is called inductive. The algorithm we developed in this chapter works in the inductive mode. Semi-supervised SVMS (S3VMS) or Transductive SVMS (T SVMS) are the semi- supervised extensions to Support Vector Machines (SVM). They are essentially density- based methods and assume that decision boundaries should lie in the sparse regions. Un- like their name, TSVMS can work in inductive mode. Although finding an exact S3VM is NP-complete [64], there are many approximate solutions for it [30, 31, 65-67]. Ex- cept for [67], these methods are designed for binary semi-supervised Ieaming. The main drawback with [67] is its high computational cost due to the semi-definite programming formulation. Graph-based methods are usually transdactive learner that aims to predict the class labels that are smooth on the graph of unlabeled examples. These algorithms differ in how they define the smoothness of class labels over a graph. Example graph-based senti- supervised learning approaches include Mincut [32], Harmonic function [33], local and global consistency [34], and manifold regularization [35]. Similar to density based meth- 22 ods, most graph—based methods are mainly designed for binary classification. Semi-supervised boosting methods such as SSMBoost [68] and Assemble [48] are di- rect extensions of Adaboost [39]. In [49], a local smoothness regularizer is introduced to improve the reliability of semi-supervised boosting. Unlike the existing approaches for semi-supervised boosting that solve 2-class problems, we focused on semi-supervised boosting for multi-class classification. 2.3 Multi-Class Semi-supervised Learning 2.3.1 Problem Definition Let D = (.731, .., xN) denote the collection of N examples. Assume that the first N1 exam- ples are labeled by y1,..., le. Each y,- = (yz-1,..., yzm) E {0, +1}m is a binary vector that indicates the assignment of 2:,- to m different classes, where. yf = +1 when 2:,- is assigned to the kth class, and yf = 0, otherwise. Since we are dealing with a multi-class problem, we have 2254 yf = 1, i.e., each example 2:,- is assigned to one and only one class. We denote by g, = (3211,. . . ,y‘im) e Rm the predicted class labels (or confidence) for exam- ple mi, and by I? = (Qir,...,3)1-[,)T the predicted class labels for all the examples. Let S = [Sm-M,-x N be the similarity matrix where Sm- = SJ},- 2 0 is the similarity between 2:,- and xj. For the convenience of discussion, we set 3,3,; = 0' for any x,- G D, a convention that is commonly used by many graph-based approaches. Our goal is to compute 37,- for the unlabeled examples with the assistance of Similarity matrix S and Y = (y;- , . . . , 311-51)? 2.3.2 Assemble Algorithm Assemble [48], a boosting algorithm for semi-supervsed classification as depicted in Al- gorithm 2, is construcuted based on the idea of pseodu-labels. At each boosting iteration, xT is the transpose of matrix(vector) 3:. 23 Algorithm 2 Assemble: Adaptive Semi-Supervised Ensemble Algorithm 1: Input: 0 D = (x1, .., xN): The set of examples; the first N, examples are labeled. 0 s: The number of sampled examples 2: Initialize F(i) = 0,i = 1, ..., |D| 3: Initialize w1(z') = l/Nl,’i = 1, ...,Nl and w1(i) = 0,1 = N1 + 1, ..., IDI 4: repeat Set y,- = F($i),i = N1 + 1, ..., IDI Find a multi—class classifier ht that minimizes 6t = 2:131 wt (2)] (y,- 75 ft(a:,-)) Compute at = $11K?) Compute F(a:,-) = F(a:,-) + (112(2),), 2': 1, .., |D| wt(i) exp(atz{(yi#ft (1%)» normalization factor and [(12) outputs 1 if a: is true, and 0 otherwise. 10: until reach the maximum number of iterations 999$??? Compute the new weighting wt+1(i) = where Zt is the the boosting algorithm creates a new classifier and redisributes the weights by emphasizing more on the less-confident instances. Beside Assemble, several other boosting algorithms have been proposed for senti- supervised Ieaming based on the idea of using pseudo-labels [49, 68]. They essentially operate like self-training where the class labels of unlabeled examples are updated itera- tively: a classifier trained by a small number of labeled examples is initially used to predict the pseudo-labels for unlabeled examples; a new classifier is then trained by both labeled and pseudo-labeled examples; the processes of training classifiers and predicting pseudo— labels are altered iteratively till stopping criterion is reached. The main drawback with this approach is that it relies solely on the pseudo-labels predicted by the classifiers learned so far when generating new classifiers. Given the possibility that pseudo-labels predicted in the first few steps of boosting could be inaccurate, the resulting new classifiers may also be unreliable. This problem was addressed in [49] by the introduction of a local smooth- ness regularizer. However, these approaches do not utilize the underlying properies of data described as a manifold or cluster assumption. In what follows, we design a boosting algo- rithm for the problem of multi-class semi-supervised classification based on manifold and cluster assumpption. 24 2.3.3 Design of Objective Function The goal of semi-supervised Ieaming is to combine labeled and unlabeled examples to improve the classification performance. Therefore, we design an objective function that consists of two terms: (a) Fu that measures the inconsistency between the predicted class labels 17 of unlabeled examples and the similarity matrix S, and (b) F, that measures the inconsistency between the predicted class labels I’ and true labels Y. Below we discuss these two terms in detail. Given two examples 3:,- and xj, we first define the similarity ngj based on their pre- dicted confidence score 3),- and 373-: m “l9 exp 3}]? m Z3]- : Z: mexp(yz )Ak, m ( JlAk’ = Z b15719? = bTb- (2.1) where bf = exp(37£°)/ (273:1 exp(3)z’-°’)) and b,- = (b}, . . . , bl”). Note that bf can be inter- preted as the probability of assigning cc,- to class k, and Zz’fj, the cosine similarity between b,- and bj, can be interpreted as the probability of assigning 3:,- and :cj to the same class. We emphasize it is important to use bf, instead of exp(§/f), for computing Z3]. because the normalization in bf allows us to enforce the requirement that each example is assigned to a Single class, a key feature of multi-class learning. Let Z“ = [2sz be the similarity matrix based on the predicted labels. To measure the inconsistency between this similarity and the similarity matrix S, we define Fu as the distance between the matrices Z“ and 3 using the Bregman matrix divergence [69], i.e., F. = 90(2“) — MS) — tr((Z’“ — S)TV
R is a convex matrix function. By choosing 90(X) = 25 293:1 Xi,j(10g Xi,j — 1) [69], Pa is written as N S,- u = Z (Sijlog-Z——:’: +Zuj— —S,-,j) (2.3) i,j=1 By assuming that 229% Zl‘j z 2211 N]? and log a: z a: — 1, where Nk is the number of examples assigned to class 1:, we simplify the above expression as Fu~ ~ 2, j— _1 82-2 ,j/Zg‘ Since S 2 Jcould be viewed as a general Similarity measurement, we replace 32-2, j with Sid and simplify Fu as N 5,, N s,- , Fu z ”2:1 ~55 = ”:21 2k: —-—1———jbkbk (2.4) Remark I We did not use
_ 0; w,- is a measure of the failure of the algorithm on
example 50,-. Using the new weighting on the training examples, MCSSB learns a multi-
class model that minimizes the loss on the weighted training examples, by adopting the
sampling approach as described in the following: MCSSB samples 3 instances by replace-
ment, with probability of each sample proportional to its weight. 3 sampled instances are
passed to the weak learner to obtain a multi-class hypothesis. In our experiments, the num-
ber of sampled examples at each iteration is set as s = max(20, N / 5). After creating a
weak classifier at this round, MCSSB adds it to the current classifiers to reduce the value
of the objective function.
For the experments, we ran the algorithm with different numbers of iterations and find
that both the objective function and the classification accuracy remains essentially the same
after 50 iterations. We, therefore, set the number of iterations to be 50 to save the compu-
tational cost.
30
Algorithm 3 MCSSB: Multi-Class Semi-Supervised Boosting Algorithm
1: Input:
0 D: The set of examples; the first N1 examples are labeled.
0 s: the number of sampled examples from (N — N1) unlabeled examples
0 T: the maximum number of iterations
2: Set F(z') = 0,2' = 1, .., |D|
3: repeat
4: Compute of and fill“ for every example as given in Equation 2.12.
5: Assign each unlabeled example 2:, to class k; = arg minjc(oz£c + 0511“) and weight
k’I‘ k’l‘
6: Sample .9 examples using a distribution that is proportional to w,-
. Train a multi-class classifier h(:r) using the 3 samples examples
8: Predict hf for all examples using h(:1:), and compute or using Equation 4.14. Exit
the loop if or g 0.
8: H(:1:) (— H(:z:) + ah(:r)
9: until reach the maximum number of iterations
Theorem 4 Shows that the proposed boosting algorithm reduces the objective function
F exponentially. The proof of this theorem is provided in Appendix A.3.
Theorem 4. The objective function after T iterations, denoted by F T, is bounded as fol-
lows:
T (,/At +oAt— ,/Bt +013“)2
FT 3 Foexp —Z " [Ft—1 “ l (2.21)
t=1
where Au, A), Bu and B1 are defined in Lemma 2.
2.4 Experiments
In this section, we present our empirical study on the classification data sets that were
described in Chapter 7. We refer to the proposed semi-supervised multi-class boosting
algorithm as MCSSB. In this study, we aim to Show that (1) MCSSB can improve the per-
formance of a given multi-class classifier with unlabeled examples, (2) MCSSB is more
effective than the existing semi-supervised boosting algorithms, and (3) MCSSB is robust
31
to the model parameters and the number of labeled examples. It is important to note that it is
not our intention to show that the proposed senri-supervised multi-class boosting algorithm
always outperforms other semi-supervised Ieaming algorithms. Instead, our objective is
to demonstrate that the proposed semi-supervised boosting algorithm is able to effectively
improve the accuracy of different supervised multi-class learning algorithms using the un-
labeled examples. Hence, the empirical study is focused on a comparison with the existing
semi-supervised boosting algorithms, rather than a wide range of semi-supervised learning
algorithms.
2.4.1 Experimental Setup
For each classification data sets, described Section 1.6.1, we split the examples into 5
partitions, with one partition used for training and the others used for testing. In each ex-
periment, we used a small percentage (between 2 to 10 percent) of training instances as
labeled examples and the remainding instances as unlabeled examples. We applied the pro-
posed algorithms and the baselines on the training examples to create a model and applied
it on the test examples and computed the accuracy on the test examples. We repeated each
experiment 10 times and reported the average.
We compare the proposed semi-supervised boosting algorithm to ASSEMBLE, a state-
of-the—art semi-supervised boosting algorithm. The main reason for this choice was be-
cause Assemble utlizes boosting technique and can exploit an existing supervised learning
technique. This makes the comparision fair and easy because it enables us to compare MC-
SSB and Assemble with base classifieres that have different quality. Also notice that As-
semble is a powerful semi—supervised Ieaming technique that was the best semi-supervised
algorithm among 34 participants in NIPSS2001 workshop competition "Unlabeled Data for
Supervised Learning" [48].
Unlike the general setup introduced in 1.6.1, we used the test set for for mnist data set because of the
huge size of the training set in mnist and the memory problem.
32
A Gaussian kernel is used as the measure for similarity in the standard MCSSB algotihm
with kernel width set to be 15% of the range of the distance between examples for all the
experiments, as suggested in [70]. To verify the importance of using the Similarity measure
in the semi-supervised boosting algorithm and direct formulation of multi-class problem,
we use two other baselines: MCSSB-Uniform that uses similar similarity values for every
pair of examples (i.e. Sij = 1, 2', j = 1, .., N) that can be considered MCSSB with a
bad similarity measure, and MCSSB-Absolute that considers absolute similairy between
an example and itself (i.e. Si,- = 1,2' = 1, .., N) and absolute dissimilarity between two
different examples (i.e. Sij = 1, i, j = 1, .., N & i 75 j). MCSSB-Absolute can be
considered MCSSB that only exploits the advantage of using a direct formulation of the
multi-class problem.
We use decision tree with only two level of nodes, as the base classifier for all the
methods in the standard setting . The combination paremeter C is set to 104 in all experi-
ments. To study the robustness of the proposed methods, we further investigate the effect
of the depth Of decision tree and combination parameter C on the performance of different
methods in Sections 2.4.4 and 2.4.3 respectively.
2.4.2 Evaluation of Classification Performance
Figure 2.1 shows the result of different algorithms when the amount of labeled examples
is changed from 2% to 10%. First, notice that MCSSB significantly improves the accuracy
of decision tree for 5 out of 7 data sets. For data set ’Nursery’, MCSSB performs worse
than the base classifier and for data set ’Letter’, the result of MCSSB is not much different
than the base clasifier. However, for both these cases, MCSSB-Absolute performs quite
good that indicates the direct formulation of multi-class problem is useful and the bad
i.e. 0.15 x (dmax — dmin)’ where dmin and dmax are minimum and maximum distance between
examples
Notice we also used neural network as another base classifier to evaluate the performance of our algo-
rithm. Refer to [50] for the results on several benchmark datasets
33
performance is due to the utilization of a bad similartiy matrix. Note that for several data
sets, the improvement made by the MCSSB is dramatic. For instance, the classification
accuracy of decision tree is improved from 33% to 48% for data set ’Pendigits’, and from
24% to 43% for data set ’Optdigits’ when there is 2% labeled examples; the classification
accuracy of decision tree is improved from 13% to 17% for data set ’Isolet’, and from 46%
to 49% for data set ’Protein’ when there is 8% labeled examples.
Second, when compared to ASSEMBLE, we found that the proposed algorithm sig-
nificantly outperforms ASSEMBLE for all the data sets. More interestingly, Assemble
reduces the performance of the base classifier for most data sets that indicates the usage Of
pseodu-labelss can produce misleading results. The key differences between MCSSB and
ASSEMBLE is that MCSSB is not only specially designed for multi-class classification, it
does not solely rely on the pseudo-labels obtained in the iterations of boosting algorithm.
Thus, the success of MCSSB indicates the importance of designing semi-supervised learn-
ing algorithms for multi-class problems.
Third, to verify that the outstanding performance of MCSSB is related to the direct
formulation of multi-class problem and the usage of similarity measure in the boosting
algorithm, we examine the results of MCSSB-Uniform and MCSSB-Absolute. Because
MCSSB-Uniform does not utilize an appropriate similarity measure, it performs very
poorry that emphasizes our effective approach in utilizing the similarity measure in the
boosting algorithm. On the other hand, MCSSB-Absolute is the second best method after
MCSSB. Because MCSSB-Absolute does not utilize any similary measure among exam-
ples, we believe that this superior performance is due to our approach in direct formulation
of multi-class problem. It is interesting to note that the performance of MCSSB-Absolute
on the ’Nursery’ and ’Letter’ data sets is better than other methods including MCSSB that
indicated the sensitivity of the proposed method to the choice of similarity method.
And finally, notice that as the number of labeled examples increases, the performance
of different methods improves. However MCSSB keeps its superiority for most of the cases
34
MNIST NURSERY
50 —Decision Stump 100 .
+Assemble
40 +MCSSB
5. +MCSSB_Uniform 5
g -e- MCSSB_Absolute g
0
< £— 2
20
1 A w . 1
. 2 0764 0.06 0.08 0.1 .02 0.04 0.06 0.03 0.1
Percentage of labeled examples Percentage of labeled examples
LETTER
PROTEIN
Accuracy
0.06 0.08 0.1 .02
0.04
Percentage of labeled examples
PENDIGITS
0.04 0.08
Percentage of labeled examples
0.06 0.1
OPTDIGITS
004 0.06 0.08 0.1
Percentage of labeled examples
ISOLET
0.04 0.06 0.08
Percentage of labeled examples
V
0.1
f
A A
V V
A
V V
06.02
0.04 0.06
0.08 0.1
Percentage of labeled examples
Figure 2.1: The error rates Of different methods with different amount of labeled examples.
35
when compared to both the base classifier and the ASSEMBLE algorithm. We also observe
that overall ASSEMBLE is unable to make improvement over the base classifier regardless
of the number of labeled examples. These results indicate the challenge in developing
boosting algorithms for semi-supervised multi-class Ieaming. Compared to ASSEMBLE
that relies on the classification confidence to decide the pseudo labels for unlabeled ex-
amples, MCSSB is more reliable since it exploits both the classification confidence and
similarities among examples when determining the pseudo labels.
2.4.3 Sensitivity to the Combination Parameter C
Figure 2.2 shows the performance of MCSSB when the combination parameter C changes
from 1 to 1010. It is clear that for large values of C, MCSSB is very stable. Notice the
improvement of MCSSB on the base classifier for dataset ’Protein’ is very marginal for
some values of C. However if you look at Figure 2.1, you will notice that the result of
MCSSB for larger amount of labeled data (as large as 4%) is significant for this data set
and not sensitive to the small changes of parameters C. We conclude that MCSSB is very
robust to the choice of parameter C.
2.4.4 Sensitivity to Base Classifier
In this section, we focus on examining the sensitivity of MCSSB to the complexity of base
classifiers. This will allow us to understand the behavior of the proposed semi-supervised
boosting algorithm for both weak classifiers and strong classifiers. To this end, we use de-
cision tree with varying number of levels as the base classifier. We used decision tree with
only one node (decision stump) up to fully-grown decision trees and plot the performance
result of different methods. Figure 2.3 shows the classification accuracy of Tree, ASSEM-
BLE and MCSSB when we vary the number of levels in decision tree. Notice that in each
case, the maximum number of levels in the plot for each data set is set to the fully grown
tree for that data set. It is not surprising that overall the classification accuracy is improved
36
MNIST NURSERY
40 . 100 .
5" 30' //W—. 5" BMW
(0 (U
5 6
0 O
< ZWZG—EP-a—a—e—a—e—e—HI <2 60W
10 r 40 a
10° 105 1010 10° 105 1010
C C
PROTEIN LETTER
45 W 15 '
640- ' : 5:10—
a a
8 8
2 35' : 2 5: —Decision Stump
‘3 a a E a a E E' a a 5' +Assemble
3O 0 +MCSSL
10° 105 10‘° 10° 105 101°
C C
PENDIGITS OPTDIGITS
60 . 5o .
5‘40- M ,>,~
a a
< 2 <
O .
100 185 1010
ISOLET
20 '
§
<
0 .
10° 105 10’°
C
Figure 2.2: The error rates of MCSSB with different C( 2% of labeled).
37
with increasing number of levels in decision tree for most data sets. We also Observe that
MCSSB is more effective than ASSEMBLE for decision trees with different complexity
and regardless of quality of the base classifier, ASSEBLE is not able to improve the per-
formance of the supervised classifier by utilizing unlabeled examples. Notice that for some
data sets, e.g. ’Protein’ data set, the performance decreases as the depth of tree increases.
This is because, unlike other data sets, ’Protein’ has only tinee classes and large tree can
lead to overfitting.
38
MNIST
——Tree
2 4
Depth of the Tree
PROTEIN
NURSERY
80 -
70-
6 60 .
E 1
8 l
0 50
< —Tree
40 +Assemble
+ MCSSB
3C0 1 2 3
Depth of the Tree
LETTER
25 . -
--Tree
20 +Assemble
> 45, +MCSSB
o a, .
E 1 E
6 61
< 40 --Tree <
+Assemble
+MCSSB
35o 2 I: 6 0o i 2 6 4 5
Depth of the Tree Depth of the Tree
PENDIGITS OPTDIGITS
60 . - 50 r
-—Tree
50’ 4o» +Assemble
+MCSSB
64° 63 .
E E
a 30- :3
8 8 2 .
< 20 —Tree <
1 +Assemble 1
+MCSSB
G 1 A r r G
0 1 2 3 4 5 0 1 3
Depth of the Tree Depth of the Tree
ISOLET
20 - .
—Tree
15 +Assemble f
+MCSSB
5‘
E 10
3
8
<
5 .
1 2
Depth of the Tree
1:“igure 2.3: The error rates of MCSSB with decision tree with different depth as the weak
learner. 2% of training examples are labeled in all the experiments.
39
Chapter 3
Optimizing NDCG Measure by Boosting
Learning to rank is a relatively new field in machine Ieaming. It aims to learn a ranking
function from training examples with relevancy judgements. The learning to rank algo-
rithms are often evaluated using information retrieval measures, such as Normalized Dis-
counted Cumulative Gain (NDCG) [14] and Mean Average Precision (MAP) [13]. Until
recently, most learning to rank algorithms were not able to directly optimize a loss function
related to the IR evaluation measures, such as NDCG and MAP. The main difficulty in di-
rect optimization of these measures is that they are non-continuous and non-differentiable.
In this chapter, we discuss how boosting can be applied to optimize Normalized Discounted
Cumulative Gain (NDCG) which is the most commonly used multi-level evaluation mea-
Sure for Ieaming to rank. We start with a detailed description of AdaRank [42], one of the
first algorithms designed to directly maximize IR measures. We further develop a learning
t0 rank algorithm, termed NDCG_Boost, for optimizing NDCG metric. Unlike AdaRank
that weights all the documents related to each query equally when optimizing the NDCG
measure, NDCG_Boost weights individual documents differently even if they are all re-
lated to the same query, leading to more effective Optimization of the NDCG measure. In
Order to deal with the non-smooth nature of the NDCG measure, in the NDCG_Boost al-
gOIithm, we propose to optimize the expectation of NDCG over the distribution induced
40
by a ranking function. We then present a relaxation strategy that approximates the average
of NDCG value, and an optimization strategy to make the computation efficient. Extensive
experiments Show that the proposed algorithm outperforms state-of-the-art ranking algo-
rithms on several benchmark data sets.
3.1 Introduction
Learning to rank has attracted many machine Ieaming researchers in the last decade because
of its growing importance in the areas like information retrieval (IR) and recommender
systems. Three types of learning to rank algorithms can be found in the literature.
0 Pointwise approaches: AS the simplest form, these approaches [15, 16] treat rank-
ing as a classification or regression problem that learns a ranking function in order to
fit the relevance judgments for given retrieved documents [16, 17]. However classi-
fication and regression may not be the best for the task of ranking. This is because
(i) classification problems are usually associated with unordered class labels where
there is an intrinsic order among the levels of relevance judgments provided by the
user, and (ii) the target variables in regression problems are assumed to be numerical
values while the relevance judgments are only ordinary variables.
0 Pairwise approaches: These approaches are motivated by the fact that the rele-
vancy scores in ranking are relative to each other. This group considers the pairs of
documents as independent variables and learns a classification (regression) model to
correctly order the training pairs [18—23], namely document da is ranked above db if
the relevance score of da is larger than db. One major problem with the pairwise ap-
proaches is that they assume pairs of documents are independent random variables,
which is often violated in real world applications. .
41
o Listwise approaches: The listwise approaches are motivated by this observation that
most evaluation metrics of information retrieval measure the ranking quality for indi-
vidual queries, not documents. These approaches treat the ranking list of documents
for every query as a training instance [13, 24—29], either by direct optimization of an
information retrieval evaluation measure [13, 25, 28, 29] or by optimizing a listwise
loss function [24, 26, 27]. Empirical studies have shown that the listwise approaches
are more effective than both pointwise and listwise approaches because they utilize
the query-document group structure which is a unique and useful characteristic in
ranking.
The main difficulty in optimizing the listwise loss functions is that they are non-
continuous and non-differentiable. This is because these loss functions measure the re-
trieval performance based on the ranking list of documents induced by the ranking function,
and therefore their dependence on ranking functions is implicit. Given that classification
is a well-studied subject in machine Ieaming, the research question is whether it is pos-
sible to design a boosting algorithm that utilizes a classification algorithm to optimize an
information retrieval measure such as NDCG. The easiest way to design such a boosting
algorithm is the approach taken by Xu et al. in the design of AdaRank [42]. In each trial of
a boosting algorithm, AdaRank re-weights the queries based on their NDCG values (com-
Pared to AdaBoost that re-weights the examples based on their confidence in prediction).
As we see in more details in Section 3.3.2, AdaRank treats all the documents related to
each query equally when trying to improve the NDCG metric, which could significantly
liInits the choice of ranking functions for optimizing the NDCG metric. In this chapter, we
introduce a better boosting algorithm for optimizing NDCG metric that weights documents
differently even if they are associated with the same query. In each iteration, the boosting
algorithm provides a weighting as well as binary class assignments for given documents;
the weak learner constructs a binary classifier from the weighted documents that are labeled
\
It is important to distinguish the binary class assignment from the relevance judgments for documents
42
by the boosting algorithm.
3.2 Related Work
We focus on reviewing the listwise approaches that are closely related to the theme of this
chapter. The listwise approaches can be classified into two categories. The first group
of approaches directly optimizes the IR evaluation metrics. Most IR evaluation metrics,
however, depend on the sorted order of documents, and are non-convex in the target rank-
ing function. To avoid the computational difficulty, these approaches either approximate
the metrics with some convex functions or deploy methods (e.g., genetic algorithm [71])
for non-convex optimization. In [25], the authors introduced LambdaRank that addresses
the difficulty in optimizing IR metrics by defining a virtual gradient on each document af-
ter the sorting. While [25] provided a simple test to determine if there exists an implicit
cost function for the virtual gradient, theoretical justification for the relation between the
implicit cost function and the IR evaluation metric is incomplete. This may partially ex-
plain why LambdaRank performs very poor when compared to MCRank [16], a simple
adjustment of classification for ranking (a pointwise approach). The authors of MCRank
paper even claimed that a boosting model for regression produces better results than Lamb-
daRank. Volkovs and Zemel [29] proposed optimizing the expectation of IR measures to
Overcome the sorting problem, similar to the approach taken in this paper. However they
use monte carlo sampling to address the intractable task of computing the expectation in
the permutation space which could be a bad approximation for the queries with large num-
ber of documents. AdaRank [42], as was described earlier in this chapter, uses boosting to
Optimize NDCG, similar to our optimization strategy. However they deploy heuristics to
elubed the IR evaluation metrics in computing the weights of queries and the importance
of weak tankers; i.e. it uses NDCG value of each query in the current iteration as the
Weight for that query in constructing the weak ranker (the documents of each query have
43
similar weight). This is unlike our approach that the contribution of each single document
to the final NDCG score is considered. Moreover, unlike our method, the convergence of
AdaRank is conditional and not guaranteed. Sun et al. [72] reduced the ranking, as mea-
sured by NDCG, to pairwise classification and applied alternating optimization strategy to
address the sorting problem by fixing the rank position in getting the derivative. SVM-
MAP [13] relaxes the MAP metric by incorporating it into the constrains of SVM. Since
SVM-MAP is designed to optimize MAP, it only considers the binary relevancy and cannot
be applied to the data sets that have more than two levels of relevance judgements.
The second group of listwise algorithms defines a listwise loss function as an indirect
way to optimize the IR evaluation metrics. RankCosine [24] uses cosine similarity between
the ranking list and the ground truth as a query level loss function. ListNet [26] adopts the
KL divergence for loss function by defining a probabilistic distribution in the space of
permutation for Ieaming to rank. FRank [22] uses a new loss function called fidelity loss
on the probability framework introduced in ListNet. ListMLE [27] employs the likelihood
loss as the surrogate for the IR evaluation metrics. The main problem with this group of
approaches is that the connection between the listwise loss function and the targeted IR
evaluation metric is unclear, and therefore optimizing the listwise loss function may not
necessme result in the optimization of the IR metrics.
3.3 Optimizing NDCG Measure
3.3.1 Notation
Assume that we have a collection of n queries for training, denoted by Q = {q1, . . . ,qn}.
F0r each query qk, we have a collection of mk documents Dk = {dz-“,7: = 1, . . . ,mk},
Whose relevance to qk is given by a vector rk = (rf, . . . ,rfnk) E ka. We denote by
F (d, q) the ranking function that takes a document-query pair (d, q) and outputs a real
number score, and by jg“ the rank of document (if within the collection ’Dk for query qk.
44
The NDCG value for ranking function F (d, q) is then computed as following:
mk 2i—1
won 12;;—
(3.1)
log( (1 + 32')
where Z k is the normalization factor [14]. NDCG is usually truncated at a particular rank
level (e.g. the first 10 retrieved documents) to emphasize the importance of the first re-
trieved documents.
3.3.2 AdaRank Algorithm
The easiest way to design a boosting algorithm for Optimizing a given IR evaluation mea-
sure is what AdaRank algorithm [42] performs. AdaRank uses an exponential loss function
similar to AdaBoost. However, unlike the loss function of AdaBoost which is constructed
based on the classification margin, AdaRank utilizes information retrieval measures such
as NDCG to construct the exponential loss. To optimize NDCG, for example, AdaRank
uses the following exponential loss function:
Zexp(-£(qr, F ))
k=1
Where £(qk, F) is the NDCG value for query h when ranking the documents for query qk
by function F. The steps of AdaRank are given in Algorithm 4. In each iteration, AdaRank
fiI‘lds a weak tanker ft that maximizes quantity m at Step 4, i.e. NDCG weighted by p.
Then, it computes the combination weight for ft and adds it to the current set of classifiers
in Steps 5 and 6 respectively. The authors of AdaRank paper [42] suggest using the ranking
features (e.g. BM25) as the weak ranker. However, a (multi-class) classifier can also be
uSed as the weak tanker. To construct a classifier that maximizes qt, AdaRank distributes
the weight 12,500) to all documents of query k equally, and constructs a classifier based on
the documents that are sampled according to the weights. To redistributes the weights to
45
Algorithm 4 AdaRank Algorithm
1: Input:
0 Q = {q1, . . . ,q"}: The set of queries
0 Dk = {(df,rf),i = 1,...,mk}: The set of documents and their relevancy
scores for query qk.
2: Initialize p1(qk) = 1/n,k = 1, ...,n
3: repeat
Find ft by maximizing weighted NDCG; i.e. 77t = 2:le pt(qk)£(qk, F)
Compute at = 211101233?)
4
5
6: Compute F(df) = 2L1 alfl(df), k = 1, ..,n, 2': 1, ..,mk
- - _ exp(-£(Qk,F))
7 Compute the new werghtrng pt+1(qk) — 22:1 exp(—£(Qk,F))
8:
until reach the maximum number of iterations
instanced, AdaRank increases the weights of difficult queries (e.g. those that have small
NDCG) and decreases the weights of easy queries (e.g. those that have large NDCG) at
Step 7.
As it is Obvious from the Steps of AdaRank algorithm, it gives the same weights to the
documents of each query, leading to a suboptimal performance. However, since a pointwise
weak learner (multi-class classifier) is often utilized in a boosting algorithm to maximize
NDCG, it is advantageous to allow every document to contribute differently to the final
NDCG value. Moreover, although NDCG works in query level, not all documents have
Similar contribution in improving the NDCG value at each stage of the algorithm. These
Observations motivated us to develop NDCG_Boost algorithm that considers the contri-
bution of every single document in the iterations of the boosting algorithm to maximize
NDCG.
3-3.3 A Probabilistic Framework
C)ne of the main challenges faced by optimizing the NDCG metric defined in Equation
(3- 1) is that the dependence of document ranks (i.e., jf) on the ranking function F(d, q)
is not explicitly expressed, which makes it computationally challenging. To address this
Problem, we consider the expectation of £( Q, F) over all the possible rankings induced by
46
the ranking function F(d, q), i e
£(Q F) — <1L-1_> (3 2)
’ k=1 k log(1+j,-'°) F
2"”IC -1
:2 E P” 'F")log(1+vr'~(>)
"1
22‘
all-4
MS I'M:
fill-d
where Smk stands for the group of permutations of m k documents, and 7rk is an instance of
permutation (or ranking). Notation 7rk(z') stands for the rank position of the ith document
by Wk - To this end, we first utilize the result in the following lemma to approximate the
expectation of 1/ log(1 + 7rk(z')) by the expectation of 7r,“(i).
Lemma 3. For any distribution Pr(7r|F , q), the inequality C(Q,F) _>_ 7:1(Q,F) holds
where
2 i — k1
’H(Q, F): i:— HZ]: 1}: 1(110g ’°(i))p ) (3.3)
ProOfi The proof follows from the fact that (a) 1 / :1: is a convex function when :1: > 0 and
therefore (1 / log(1 + 2)) 2 1/(log(1 + 2)); (b) log(1 + :c) is a concave function, and
therefore (log(1 + x)) S log(1 + (27)). Combining these two factors together, we have the
reSUIt stated in the lemma. [:1
Given H(Q, F) provides a lower bound for [3(Q, F), in order to maximize [2(6), F),
we Could alternatively maximize 77(6), F), which is substantially simpler than £(Q, F). In
the next step of simplification, we rewrite 7rk(i) as
74(2) = 1 + 2 104(2) > «kg» (3.4)
47
where I (x) outputs 1 when x is true and zero otherwise. Hence, (nk(i)) is written as
mk mic
Me» = 1+ Zuwka) > «km» -—— 1+ 2 12mm) > «W» (3.5)
j=1 j=1
As a result, to optimize ’FMQ, F), we only need to define Pr(7rk(i) > wk(j)), i.e., the
marginal distribution for document d? to be ranked before document dz? . In the next section,
we will discuss how to define a probability model for Pr(7rk|F, qk), and derive pairwise
ranking probability Pr(7rk(z') > 7rk(j)) from distribution Pr(7rk|F, qk).
3.3.4 Objective Function
We model Pr(7rk|F, qk) as follows
mk
1
131-(«km f) = k exp 2 Z (F(d§,qk) — F(d§,q"))
Z(F’q ) i=1j-«k(j)>vrk(z')
mic
k - k k
= Z(F, qk) exp (;(mk — 27r (Z)+1)F(dz-,q )) (3.6)
where Z (F, qk ) is the partition function that ensures the sum of probability is one. Equa-
tion (3.6) models each pair (df, (if) of the ranking list 7r’c by the factor exp(F(df, qk) —
F(d§ , qk)) if dz? is ranked before d? (i.e., nk(d£°) < 7rk(d$-°)) and vice versa. This mod-
cling choice is consistent with the idea of ranking the documents with largest scores first;
intuitively, the more documents in a permutation are in the decreasing order of score, the
bigger the probability of the permutation is. Using Equation (3.6) for Pr(1rk|F, qk), we
have 'H(Q, F) expressed in terms of ranking function F. By maximizing 72(Q, F) over F .
we Could find the optimal solution for ranking function F.
AS indicated by Equation (3.5), we only need to compute the marginal distribution
Pr(7“k(i) > nk(j)). To approximate Pr(1rk(i) > 7rk(j)), we divide the group of permu-
tation Smk into two sets: 050,1.) = {Wklflkm > “ICU” and G’b“(i,j) = {Wklwkm <
48
7rk (j ) }. Notice that there is a one-to-one mapping between these two sets; namely for any
ranking wk 6 05(22, j), we could create a corresponding ranking 7rk 6 G§(z', j) by switch-
ing the rankings of document df and d? and vice versa. The following lemma allows us to
bound the marginal distribution Pr(7rk(i) > 7rk(j)). The proof of this lemma is provided
in Appendix A.5.
Lemma 4. IfF(d’-“, qk) > F(dg-c, qk), we have
2
1
1+ exp [2(F