.ziéhfit I. H.00.. - _ ll 00 i 00 M0 . 00‘ ‘4 4034.4.340.) r .3?.«.2800.»W.rk 303...... . 4.“:1 are: .... . . JR '0. .. 80:40.0." .4. 0.. ”1.... '.000300..... J... ‘0- ....01... 0”. ... _ . ’libo... ..I 0.4 .0. 00300 . . . 40 I. .... I. A. - 0 . . . 0.! 4'0... .. 3.00”... .0\.OIHL000 it. 00:29.7...2 30.0 Q . 0.000.111 '1' 0.. ......10‘. 6. ‘a0". .2. van-0.1.0 .0“. 3 .. .0 . . 4. . .. ...- . .. . 0 .- O l. ...0. 4‘0 . 20m 43.0.0. . 0.1....0‘. u..d.L-.‘ _ ... 0. iii-... .mm.u.",..,m...,..m£«4.4% ...... .V -. .... . _ ...: .. ...... ....._....... $.23.“ 0 . . 02‘ 1"" 0.00 w."0 . «ls-£00.”.- gurflhrv .0I.\ 0. 0m “"0 . . v .0 . 4 .V... .... .41).... . ......3.¢..P..)I:a .9... r .81?‘..Pw ..1...044..I0..l_h.01 u.;|..fl...a..m_.. a . . 4 009.. . I V. . r. u 0 ... _ . . . Why”... r. 9... gnu. ENC-00.... hr. .342." b a... ... “0.0. 0 080‘ 50.40....0 ...P. .... 00. .1... 0.1.4.3633 ....I. 001:0...- .. 4.0 .....4..4...,._.....- . “Mm... ......pr .... ..7....:... +5 arm... ...." (.33.... ...... a: s . . . ..."..r..,.4 ....-- .. ...; « 0..§0 .- .0 . . .| . v. . .. . . . . ... . .. ”00-0%,.mii‘0 .0. 30000.. 000.. .... .35.... z 4 .. ... . . ...ltl. : .0 ‘2 00. .5 V . . . i . . ...... I! . “Mfr ..v”.<.0 .0 0 ...-3., .b¢. ”VOA-00000” —“M 0 ‘01fl'.’ 00 ..Bhlfi . .. .. V , .. fific"... 0H.‘ov.'.u.00.0004.‘000.0.l0uu 0.00“.“1 330’: .. . .0 00.0.0 ... . . .30“ r 0 0. t. 00. y ‘4 J .- ..I‘E‘. . . .. 323.010.01-04? .2 . ... .. ...-0330,52, V: ...0. ... z...» ... .060. w; ..hv..400084.,04 » V . .00.. 4 C014“ «0. _ I... 0‘... t 00.0. . q. 0’ 00 . . V . .2‘....lt..000.0... ~ 00.13.44.443” .0.. Q . ... . 3.0!! 9.. 24.2.. .20. 0:4 0.»..40? '41‘..’zl .. £1.60! ..0. p . Us“ . .00. ... .4 \ru ”.4 .0 .0 . .04110‘00 , .. .... ...-.0 .... 0. 5.1.0.3.... . . .0...... 3...; 0:3». .. . ..YG. . . m0 “ 0.02.:— G ...!- Nn... .- Jpn...“ J3“.4.03.\. haul. . . at... i.‘.1!§.l....00.... li...‘“.‘t.u..3 3000030953.... 4.. ....vv 33'00 6 .. ... c. a... .. . . . ...: ;. . .0 ......rPV .. . «03.1. . .. 0 . .. ,3 .0 o . ...-.00.... 61.8.} .0 . . . ..v . 0.0 4. 4.0...“ 0“: ....‘r . . 0.0.... ...... 40.. 0.. 3000.4}. 4 .. . 03.52.42. 307.00.20.30... 0:1. 3 . 0.- ... ...... .11. 2. 0.0... 3321.4"002: L . ... . . ...... (.1. . 3 M304 ‘. u 90*. ...V . .0... 30..0 .4 .4 . 0.“ . . . . . .00 21‘“. .... .....w .4....4....l 00002T .000. .... sC 0.0L .fosii ... .7094. 3.0.4.0.... .0. .0. 0 . 2.13.0}..3. 0...... .91. ... $.513LI: 4‘ ..l 0 4 .o 0’ .0 0‘ 0. . . . .0. .0. .1. pun-..Nrfifi; ... up”. .Jod. 3"? O 0 Y1!- 0. (.r.’ ... "2 o. .0. In. . 00 .... 0 ”to: . . . 3.0.104? 3.4.4.... . 4.. .. 000000.}. . . 0 u . I...) _. ....n‘ .. ......»1‘... L00 .. . “05 2' c. . Q. l _ 40-0 ‘20 .w..0.fi..‘00.4.a ‘4 00 P .. 0 £344.. . a . . . .3". «.... fig—J0 41.00‘ .1000! my.... 0.0.0. . 0.0 ..I.'000.... .9030... . .. . . 0. ... . .4... . 00......‘0.0‘0 0.“ 3.0 00 nu”, . . 0 0000.00. 2. . 0"C.’. 0. illoka-rl". . 0‘ "0M runu“'0’fi0 ... 0 4 5.0020... 0.0.0.0. 0..., ..0 3:00. . 00.. 0‘0‘0. 0 4' .30.! 0... 0 . ”0.00.00 40 3. 0. $1.40 _’. 3.. .Iv ‘00 0f ‘03....0. I». \ . .Jfimrsflhfl931mnmmonu: .1.0.Vv..4ru£?442524.14)”. It.tl....uuus4.....u4 3... l... ....v. . ...... . ...4 0 . . ....4 ...: k .4... z 3.. . .... ; . 4.. 3...»...3; _. ............ 4.45....stzfl . Bo” f 10d: c... Q. . . 0 . . y . .. ... .4 .p . . .. . :4. . _. _ . . .. .. ......hroLT. ..P. ... ......uwnhnfih... ....nunviodv...”fiwuwikur,.u. ......3mF: 333.3... ...: ...... .....ufiflzte ....\....M....._..d.:.4....4.vu .... 4 . .... . 3.... 4? :93“... 40.4.3.5»... 2.9.3.. .. ~. ... .... ... . .. 4 3... .3333“... .... .... .4 ...).- ..cl..\...3.. L. 5.0.3....ur8: 9.6.080. . . . ......{iiIltn ...... s... 11.2.... 1.2.: . . I 0 .... 3.27.1.5... ... ..I .31.... 4......- .. a .3... .0 V 9 .0 c 0 A f u . 4r . 04’. r V 0 04.. . . .04. “.3440. u. .. 000. . 0.44. .00, ... '0 .1- 0. .4 230.300... 0. 0 . 94 ... . 4.!0...0.'0. 00 ...s - ... .0: 0.000 "00‘00 . 0.... 1.. . 0. ‘ 0 5.0 o 0 is .0 00 r c. .00 a r ..‘F I 0 '. 0 ‘03 I . ...? ." .1. 0 $4.0“ '0..!:.C.o'..0...00.. 0.00 . 0‘3? 0 z ..w‘ .4 0 0 I 0.. ‘0’4 . t_ 0.40.. V . Q0 . J C .0. “w. 401.43..“ 3030...“.- 0.. .nwi.u4uom“0 town”... . 0:4an.449! . ..I ... ...-0A. . 52.. 974.33....11. . .0 (Auto... ....‘1. ...... . ... our... 0W. ._ . a.“.02 .\ .0 V 0 . 5.0.1.8.. ‘vflu’. ...n0 “0".Ifil.z 0.0- 0“"..“0.0"0...“. 000“: u .0. . . 0. v. . ‘00 run-.510... 00”. ... “.0 00‘. H_.”HE u c . fifrt 0.. I. .. . I. 4.2.... 0.... 8a .3'. ...? ...... . .. 0.0. 6.. ...; .434: .2 .\ 4 620.. .0 0 u .0 V... .. .... . v1.01... . a.“ C 0. .0. $1. . 0.. .0. ..0. .1 0.0.. v ..w. _. .0 5'1? ”0”,. 00’ ,0. 1 0. ’0’ ....0‘. B00~.I‘*..- . t...0 . ..w‘... ... 0'0 :1 . 0'.‘V ' 0 r .0 00.0.. ..0 .. .. 0 0 .13. .7 1..“ 0 . I . ’34:...» .u’p ......aeunfiu . .. .. .. .... .. . 0 . 0. . . . Nu.‘ 33.44.10.) 0:300:20“: ....u. I’PIIJB. ...0. ..IL .00.“...1.” .... 4.. cl .. s . . . 0. 0p . .0 . . 0 2' ....I 0 . «fu. 05.09“." ‘h‘invw “$4004.10 0’”... r034..0.0aa 3.00:... ...... ‘fgbg; 304.4" .. .01....“ 00. 00.0. 0... ..J_ 0 7‘. ......04:4.. 0. 0.0... . A! .001 borasoostf910000. 0‘! ..1100. , I. .46...0.0.I0.. ..0. is . {06...}. .000.0.‘0W.200.00...004.0 JON“ .04‘0‘FNQ71Q0 2.00.0... fl... 8‘ up... 0.00.70... .3“ 0...0 009.0...0»...3|.0.‘405..h . . ~ .0 . a . . . . lo. 2. . . . . 4. . V . 0.. ..C..ul..§u0..n.0..0vo l "0.4. 9.0. .. . 0.3.... s”... . ...-0 00" .‘mutc.... 343.0001... EV? ‘i: 0.0... L4 . .0 . 0 . - 439.1%...0. i0. I... . 0.— . 4001.....700 . 0|. 0.. . _ .JA...:..,....HL.......H.....“.U... . . ...“ ..u....w..:...xv......ur... .. 0. . . . . . . . . .. .. . . . .. . ... . V, . . . . ...... 4‘22... .....1. 4.11:.m~....w.un~.4&.\fil&. 0.0 a. 020...: . .- .04 .00, . . .... 00;... .10. &.~‘ .’ .hu . p ... ...... .09’......,’ ...?!» . 4. fl}... .0. . .. .. ... 2:... _. .... . ....S. f... , .. . . . ..0.‘ 00.1.... ..1‘0’10“..,04’€..“..L0 41.0 “gt“. .’ . . i. 0 v 0 3 . . _ u... £53 ...Wc. _ I0. ? ”Canny-lug .001 l...¢.00..=0. .. . 4.0;...4... 1.....4 . .. . . . . . . . .. . H. . .. .. ..0“.0.‘f..4. . .1. ' .. 0.. A. ..00 1.00.0. 10.6 . . ‘0. . 0 .V- 0 0 U ..0...... H. «. ... ..m—410Lv. ..VT. .2701. .. fig... , . . . 4) Q I 0 I 3004...; . 5‘00 . ..‘13. . V... 30.00... c Q! . 3’ .0.‘ n" ’ ...0‘0...“~0.00.0. .. .. . ”.042... 0 0b$ ‘..”_0'09... . ..0 .0 .100..‘...-4044 ($0 . V. 44 . it: . . {4.001(f’ .‘ 000.013.. 0.01‘ . m4: . “.4. .. 0....Jt0dfih .....41....0.0.Gr0.a.0£. .3." “Wag \0.‘ .V 0 0.0.24.0. ..v , 0 . . . . . _ . . .. ... .-....2r4zfi... ... ... ......k... ... xx." 1...... .535... .... 2: n. . 0 u 00 0 ..000 . 0 . . . . ...... ... .. C 4. 0.030... 0.30;..- .0 .- ... . ..0 0.0... ... 2413.201 0 ' . . o 0 0).".n . .. 0.0.3.. 0 E”fi§3.ulLI.Jumsfl....O0-‘ Hurray-n. . 0 Van—.00.” 0‘0000 ...0 5.40.4. . (I. 044000.304— : (31“0‘0 .0100... . . .0 t 0 “4.01 0.0.0 .0. .0 . r. I ..‘.4ul... .. I .... ”.00.. .. r.0~I0..i¢ ..‘0: . .4. C.“ 0 . 00.03.90.090... ’i-izt 0 ’0'. 0 06“. 4 4-,... C 0 nor00<0‘ 40 .4qu ..~.v‘.<.i. 00...“?! ...ibaflilo? 450.0...00. 0 .. 000003900 :‘00\ ‘. ..4.0.0l .0 0.0.?! . .0. $03.»...0JJ . 3‘ 1.0.0.100.” ...00? . 1.00.0.0. .... .1000... . . 0. :3240 0.. 0004 00.0. 20.4. .2 1.930.405- .... .. 20... A. . (.... .. J . ..0.0« ”tr. . 40...qu...3.0.. ..«040...0.00.0.0 . . $1.1 4 .. .7'0‘1 001000... 0.0.0.. .6 3.20:... «.310.- . ._ 4. 0.... 324.».O 0A .1 .0... 00 .40.‘ ... .n . v. 0.0.0:... 4. .0 . .12... 040 , . 48 ...... . ... “ ..'.4p.. .... ..I. 290...".09000 6.0.(. .... .. . it..!...... .33 shcl‘na 0 0 "IF”...vunub 4‘04 .t‘...: 4,0 .00..." 0 V 0000.. .3»... 0 000.0000 0009.30.00 000 . 0400 00:00.40. . . .004 o u ’0 4| .. 0. 4‘. 0.. C v. ‘0 0 ..r0. . . . ._ .. 0L9..." 0.0050- ... Olla‘oc0 ll. 0‘! . .00... 0 .04. . 00.3000 .0... '10: . . .. _ 00 0. 0 O 3&0 0 .0 .l .00 -.E‘OJ. " 0.. r 3%.‘1000020u'l 4’. 04. .0. 0000 (4.0.- .0 0.0.04 ..0 ... ... . . . . . . ._ w.“ 2 . .....“0’ v‘ ”u . ’Dr00000.*0v.0.q ..II. . t..‘0! 00.105 0 000.0 ......0 .‘0.....4.0 . .30. . .00.! . . . . . . . V ~ .0 ua‘d 0. 010. 0" .00». 0... . 003$ - p 1‘13033‘: . ...... ...: .4...I0.0.00io.0040.30. . .4....00. 0.. . . . . f . ... . . «1 . . fizzw .‘ .r .1."¥-I00¢4‘J.......\0 0.00.7000 . 4 . ...‘03403... .0, ‘. . I . . . . , . . . . . . . . . fl ‘0.- 0. ... 0’00 0 0. ’ . 34.33.40 .40... ...00 ."93...0‘1.0.0.‘..V. . 0'. . '04. . .. . . .. ... V . . . 4 . ”I. 9—... ‘000.l¢ 0 ‘4.”l. ' .4 .-.... . y . 00. .132300040. . _ . , . . . . . ,. . _ '0 ..‘b. .. O. ‘3. 3 0. .0. . ...4... . ....u..4.u..u.-...... . ._ . . . .. . . ... V .. . . . . . . 34..."..4434... ...... .... ...... a . . . . . . . _ ... I... . .. we" ... .13: 0 ... a}; . . .. . . .0212}?! ...... ...: 143$ . . ...440. 43.53.): Q54... .0 t «a L . . ‘,0.... 0. 00 ' . 4. .... .\ ’ ' ...... .. . .... «ma: , 00:00.- . .02.. 00.0 . . . . ..0: ..f. ... .. ...4‘m“ . . . . .. . ...?“HNJ‘. . H. ... . . 1.8.2:...0...’ 0. . .. . .. v. ..4 .04.... ..."...JrvOc . . .. .. .. V .H. . . . ...: .. 0.1.0. . . . . . . . . . . . .¢ ....00. ... 0 .... . .u. 59?.) ..nm"...v4. ..ty. 0. a! an“. 0 01.0444 v. ...... .....afigpmwfiv .... . .446... . . ......“ ......hfiuyciz....¢ r... ...”..JnV . _ .4-.. 1a. 0 .0lr‘m" '1. 00“\0.i0. 00 I ..I . 4. .0‘ a. , .. .. .§..X....O\n .0 0.1000Q’100034-0! .00. ..‘~..: 73,909.00. 1040": . 'n..." . ‘0 . 4‘. ‘0. .‘0".\00$00....,0. .. ...-5.0.3.0... . . . .540 ‘0" . ‘0 J 09... 0 . 00,3»... .. 0 . 3.0.1.1.... ...“... i333}... V r. 0 1. 03.(.00.9fl40. ..ih‘n0.3.‘l .C 0 00.00.. 4- 4’ .0 .0. .. .....04. .. ...00 N.‘ .0. $014....Qr’00. .Y..0’,’\4 .. . .thr. I. 2‘1...90.o. ...: 000...€0.. .l. . ...l....4.0.0_4¢. . Lt‘.0.0.. 4 0004s... .0” . . . ..‘..0?b. 0.0.. 00.. ~ o‘. .00.... . ....‘ 0‘... . 0. .‘..\l...?9 ... 30.. 0V. 0“. . . ...b 4. 4 .0 4| . [3.113031 .00... . .... t: . .. .000... .440. 0.....00. . 1.5.0.} ... 00‘9.IJ~. 040... 0.. 24.0 \_..0 . .Is ......0. 000. J 1 40.3.: ‘.‘..:'.0’.-.0... 1. . . I 0.... .. ...-0.0.? 0... 00 0070.6..00" 0.4... . ..0. . 110.0. . s . :00. riot-:0 _ . . . . .. . . . 4.090 :2. 0...»! 0(9s0u1 ..0 ..40......0.O. . . . . . . . . . . . . . 0 0> ..00‘0. 0‘. .1. 04.0. .03.. . .. . . . . . . . .04 .7“. 0| . y . a ' t, ' .0000 .‘Y r o 0.0‘. 0 . ..01...'4'004o... . L0 ‘ 0.0 . . ... .0. .. ..v s 000 .4 0.. 01:09.20”? “mpg“ ... . 0 . .. v.70 ... . . .. o .0... .4. . .0 944...... .... -...V... .............4 ._ .. ...... ......hm 0.. .. 1.5.3.3... . .0..9._..0.1..... 0.8.. 44...- 1.4.9.. 0 .. ... . . . . ’0 .00..00.0.'. 000': . . . «0.0- 0 _ v “5’ .."00.0 . . .. . . . .0 . ......;.....4., _ . ._ 43.02. .. . 00‘ r ...4 000.1. . tr’hr. 0.. Q. ... 11.00. .... \vsv—ia. $t’..nr ...-.00 0.0%. .. g I) r 00 ... 0... . . .00 ...040 30040.“ . 040040. _. 03.0..040...‘.0.II{ 00.! .0’0.‘Q<00. . 0 ..0.~....fb._4. . ..40 lotto-t av... 30.07...-... . V ....bfivrmhdwn 3.... .. gmm V. o)“. 4. .00. ..noo. 0, I... 0.0.}; orfl.‘ 0. . 0.0. . .. 0 . 9440.034, 0.0 . .‘I'Wabiv 43-4 0. .404... 3.0.1.0 ... ..00... . ....0 . ...-1.2.0.0. 4.! 21.42.... . .44 .. . {I . 4...... .0 Q3. .07... . ..‘U0.00:10.000....0,‘0.4 0‘. .4.’ 4:... .0...-.40 .. . . .... on . 0.0.0....... .....0 0.000... “0“..- 40.4 \01 40.... 0 0‘. 0 . . ..0‘ ....00‘; I $55.0.p9...;.'.4:4. .. .. ...-...... 10.4.. I... \‘lu.t... ..Jn 4. l3)... .0‘ .o .0! . on» ..s..0.. 4.0.0.310; .03.... .2...” . o. . .\ 0. . 440.41.001.30 ...: 0 0.... . 0. . , I... .....0. .. ... 4.. . ... .... . 0.. .f0.... ...-00.. _ 00 ,. .‘7 10...... 40 f... . '04. .1.) 00.01.51. ‘ (0. 0 . \. 0 1 0‘.."0.. . ...” y . .. . .....- .100 . .. .. . , ... y . . . .0s. V0.00..- 00. ... .4 4 . T. . .9... .‘0 ‘,_0.'40 ..w: ...00: .03.. .3“. 0 00. 049 ..r w . . 3 ...... 0.0906... ......0. .500. 1"!le 0. (Irix I. . h‘ 02.1.3.0. . 400.. . 00K. 3.0 . C . . 1.3. 1 ...0 8'} 't.l¢ J51? ._ 9.0.. . 0.» 0“ l . ’. “00h“... .00 . J” I W 93430. H 0. .... Q '0' .. 0.0 . z . .40000644 44.4 . . . O U 0 .. C: . .....0 0 .0 0.0.: 4‘ 0.0 0 0 .... 00. .i: 40.0..- ...04 ... '...0.Il.24 I‘i 0.0.4.0.. . 4 40.. 0". . I 0.. 0.00.11. .....‘1. ...-0...... .. . . ... ....0 . 0.0.0.0.... 400... . 0. 0.0 3 0 gr 444 . . 0‘00...) {40.4.0 ... u 0. 1. 44. J0. 003.0. . 30.307...‘ .. ..-.4 0 ..0 . 0 . 0.0 ‘~.40~..0|.... . ....0'. 0.24.... . 0...... .0 1‘ I..0..I.\0.-4~..00 0 0 0040.0... 0 V» .0 ’.oh.s..l.0 0.0...0..V . . .. . . . . ... 1. b? . 4 0 . .‘i .. . 0 ... 00‘ .V . . o... .4... ......9.00.( . , . . ... . . .‘30. ..00 A . . 0... . . . . . . . 004 .Otou....| :02th3 . . . .10. u. 0.0 0.0 :le £06.... 3...... . .0. .h 1.“. ... ; . . . . . . . .. . .940..¢-.. 101.1... 4..- 30.“ ... 4 ..40- 1.422....4 .0..04..4.v0.. . . 0.. . ....u..u....._ .4 .... 20.4... . 02.4.0? 310...?! .0 [20.4.1910 .§3 00 ‘04 0 0.00.499 . 0.04 4 41.. . £0 . 0,1 ‘I. .0 . . 0.. b. 40 o . . 0 . . 0. s . . '7 u 0.. . . 0.0 . o n . ‘000 . ‘000040...0’1.0l. .r..: 0... .....40 '.. :03 . V .00 .. .0.‘ 0.0 . ,I K. .3» 90 01. 4 . . O . 10 .... . 0 ..4.. 40.0000. ... . . r1? .0..4.00 .0404..’.0.0... . r 0...: - . -r..:..... . . .. . . . : . .. . .. . . V 2...... ... o. 5. .4 14 . .... 0 . , . . .. 0.. 0 .‘4.... . .IV . . u- I..- .04 0h! ..“0 0 0‘40‘0'400 . .U . . . . . . .. . . .. h’ - ‘0.00p (0.. c ......v ...... 1' o. ...... . ...ituu . . .. ‘4.0.‘. .0.. . .. . . . . .0 00 ... .‘us.’ 3. .... . ...400 . . .0000}: ID. V 0.1.0. . 0... ... 0 .. .... . . . ..4 . ... ..00. 0 . ....l' 0. 00.0.0.1 0. ... . I... 0.44.0‘.o.0 4... .00....- ..|sz0 I.- .V 24 0 . : .9 0.0.. 1.0!... 04.4. .. .... I . 2.0.04!!! .09. 4.40 :I It... .. .... 3.3.1.0. ...47 4... . I .2909. . 8 04.0.. .25.: .0 ...-... .4 L ... . .....0....'.0. :00 0... ......0. . .. .. . - o 0. .t‘ . .0..- .0. . .01. .l 0.11.... I. (40.91.... 0.; . 0. .. . 00.. 0 I4... .‘9 .. . . 7 . _ .... 00 .0' 20.40 .I. .. .. 4. v... .... .04 00.. 0.5.0 .... . ..0 04 ~ . . .. .0 0 o. 4.. .000 4. . . . . ...o. .. .. . . .. 4.... V 140......th‘0 .44.... 000.. .. - $4.430 00. 3.3.00.4: 04.0. .00‘ 0.1. 00 s ..0 .-. 4...! I... I 04 . —. 0 0. . 40 .440...- 0..‘ 0 .. . . 0 . 3.000.0- . . .. ... h: .. .. 4 .... ... .. .4433... . . ...-...: .. £00.. 4. . i. ....0... ... . 30.24 . 4 u . 04:110.... 4 . . .. H...- . 00.4.3.3... . VI ... .4 -0 0404......4440: . 10.... ..I. . .3 .....l .. . . .0; .. V . _ .. . ................ _ . .. a. Vsu.n..u....rh.. .2. _ . _ 4......4... .... . . . l : 4.. 0.. , .0.- 4. . .. col- . V. ... 0.40.. _ . . . . . ... .. ... .0. . It . . . _ A I l. ...v 4 4 . . . . Q. henna. . 4.. . . ...... .40...“ . ..V. .4 I . 000... ‘00-'22. . .. c. 0. '9... . .... _ . . . ... ..., . . .. . V. ... 00 .03.... .u. 0. ...."w.‘. . 24' 0...P~00..0.< . 4 . .0 - 44 PV 0-4.. ..4 44 .. .4. 4 . .....1... . . l0 44.... . . 10,0..4 . 203‘; .0240 .30 04. .40.. ... . .. .. . .. . . ., .. . .44)..- .0., .. ... . ....4. ... .41.. .0. 0 to... .10 if . ... 4.... .. Z... .. ,. .. . .4. .14.... 4. 00.00.0......0..0.0.|I0..o_ .. ... )0... . . . : . .. _. n.. .. .. . .u .. . ‘é. 4"}. V. s 0. . . '4 -444 I 0...... .0 . . {I ... ..04... ’.-..0 0 0 00...‘....0.00. .00! . . . ... . . . . . _ . . . . . . . 4 I. ‘00 ..0..‘I0‘..00 ¢qz . v 4 0.. .4 J a .0 .. ‘4 .w.. 4 . ..4 1.... .....10. .... 1.4.44.0. . .. ....0 . . #0.. . .0. . ”V. . _ . . V. . . . . . . . . . .0‘0 0.04. .. ‘0...“‘01'. ‘9 30‘ ’9 ol 0 .0 . 0 ‘0 . 0.0 .. C0 . I. . , . 0'0 . . . 4.0.}. 0.0.00 400 ....0:.9..004 .. 000... . Cit. . . . . . . . . .. . . 00.“. ' .l | i 0 0 Q .o o 4 . . I . 0. . .. 10.0.. 4 . V . ....10.‘.P04§4..‘.0!.0~ .. : .‘........ol. 0 .0 .00 000 V 0 .0 ‘30 . .... . . . . . . . . . . . n. ‘I..§‘Op‘..‘5 it... . ... D 0 ‘0 .. . ,0 01 .. ... . .. . 0 00"...L000... .. ...: 4......‘ 0! .2 . 0.4. .3 . . 1 ... 0.. 0 .L.ao . .0 .. ..- 04' 0. 4. '00 v ... . . . _. . ....0 . 0.00:... 4%: 0“ .0 4|. . - 4 4 - 04 . -.w ...-0. . . I 0.0 }.0 .‘v I... .000... . 0‘4. 400 0a .4-[0€.. .0. V .40. .0 0... oVQI‘oG’l .00. ...-.00 I . 0. 4’ 00 . . . . . . . . Q.‘IK..¢0 ‘00 0.4 .“0 (0'0 .U" 1‘ .1 . 0. .V D 9 o 4 . . » 4 . . 000$. 00. 0. ... . (I... 0.. 00.0 own .4 .4 1.40. . . ... ... 0'3 . 0! .. .0... 0V1....'\ 0.... 0.0.00. 0 3.. .. 0- ... 0. | \4 ‘....4..k __0... .0I..0..‘..0:...0.1 I... , . . A. Q .. ‘0 ....“4. \ .0.“.{.00.:. .0. ...." . 4. 4 . .0 k 0 . I . 0 . .. 0 . . .1.0 . 00004 30.4.. 5400‘. 4830‘“? 4 . . .00 4..." . . 'q'C"V"0i 0&4. ..‘V.." . ‘0“ . 4i - a 1-. .144 0 .0 . 4h4 5:... 0.040 .. 40¢:00. .4. . ...“ ...0 4 s . l . ...!“ ‘0. '0huw .....0 5‘01 0g' Do 0. 9 . . u . I. u r. v . 0 4 4 . . .. .... ...0 . . .09 .0. C ’50. .f-h ‘.!‘flo.~f44. '. 0 4 . . . 4 .. 4 0....L_0._o .. 44. .. .00 .r. .. l0 .... .. .. 3000001 0 0. 00009.. .‘,0~” 0‘ . 00...;0034000040 .' L _ .I a . 4 . . y . ,o. .094 0030..., 5.. . 4.! . .4. .0 . . . I 9. .VI ' y ...-00'.- .‘tIOQQ 90. 4.30.100 1;! 0-4 ' a; .V . 13.43.: ......V... . .....w.....:...4.2..4 ........... .. .. ......V. I ......u..4.......u._.....4. ....u....4.....4:;4_.11h4 .. 4.1. . 4 . ..v .. . ..4 . .....IV . .0. . . . .a 00- ..0 . . . . V .. . . . . . 0.04. 0~ 00.0. . I. v 641:...” u .. . .... ...vtn. . 4 .u... . .. . . . . ...-30.. . :2 2.33:... 4 . . . {.02 4.0. ...-...)... 4 . ....l. 40... .0 90' V. ...: 1280.01.00.146‘80.‘ 09. .n” «4 4.. . a . 45...... I ..., 0.6.302 . ‘4 H3 .. ... . “.90.. . 0'04le..0040.'..'90:. Pin .. 4. .a .. . . 4 .441. , 2114, .. . . a. v . .. .T.&.4.2.V.. . 4 0 cu...” ... .34.). 0.01:4! .0 4. 0 494.15.... 0 1 .4. I. ... 4 .. .4.4..4¢o..... 4-14;-.. x... .... ... ... . ...-..l...\ffi04:.4vfl”0.119.: ’00 . ,9: . . ..., 9. - .. 0. . 0 , . 0 .0 .4 0L .. . . .m ...0 .00.. .40 0. 0 k . .0 .Ié'. 10:04.. Q. 0 .0... 0 2* ...: a... 4.041.. . o . . ... ...]?! 0.: ... ...... so . ’nu.~I-;AQ04"0‘.’..’.'§ '01.! 0"- n. | . ,. 40 V? ._ .4 . . 4C0.‘ V 4 . . I . o . 0 .00 0. . '0. 0.0 .0 I0.‘J.040..0.0W. 0.90.0430! 401 .0 .. ..4 4...: . V. 4 . . ..4 . . . ...4 . 4 30.. I- 0.4.. .4 .3 01.4.0.0 0. 19 0.3.4-, . ['0 . ...!l‘ l ..j.444.04 if... . .0. r . .I. o . 1.. . o -u ’ ..‘4‘. 0.0. .Vl'.. l...‘ 4n-0,0.0.’.t..’0 0'.‘Q‘O 0‘0.I§40. ..4 . 04'. .. . 0.0 . 4 ..r . .l .0 ... ......0 . 0 43.010 . . . l. . - _I.Q.|,II .4. v. .. . 'P0.I. 0. 'IIO 0.0.040 0 000.01. 0 .4. 0 .... . .. . .4 . . 9.. . .. 4 . ...v‘o. .96 . . . . .... 4. . 0.... 0.72.0.2... v.1... 00.0.l 00.0.. 00.05.00.040U;a ..44 4 . 4:0 ....44. 044 . .4. . ... ... . ... ...». J . 0 ’0 ... .. .. ofi‘3'53033 0.. 00:“! .03... 0.0.0.14.\ :04 - .04.. :44. . l . '4’ Y . ..0 9.5.4.0..- 4. 0 .4 V .9..l I. .4 .00. V. v.. 0 s v .5». 40 '10.01‘..0 0.0.‘00 00.00.00 2 l 4‘... . . O. 4 I... 4.. 00.030.-. .. {loft . ... . .. .u ., ...I..Do....0.l.. .. . 2. 0.20.00.0a409. .fti‘v 0.’.0...h. 2 44.9.10400390‘ . .4 .0 . 0.— J I 4 ‘0. 4 .o . . . ..‘f‘ .... o 0 4 C. . 0 . .. 4... . 00.2200 .000.-.0.0 . ...00. ‘00 .... s: . .. . 0.000 . . . 0 i. 00 Q. Q . 0. 0 0.100‘ 00‘. 91.0.04’...‘.00 . ....4 . .4 .91.. 44- Vno ‘l. v . ,0 _ 0.0 : .0.- .. .0‘: 50~ . .0 4 03.. ‘ ...: . . .40. 0 ... v.40. . ‘00.. 04. .l.t0: A.I.s‘ 0-00.0 '0 J’nnd‘JQ’VI‘ . . . . 4- ... o 4. I .. ...4 . .0... 0.00.0...‘06'... .030... .... .. . . . . .. .0 A0 7: 043.4 1.0.02.0"... 0....0. ‘.\4..‘. .0! . _ .0v .0. 9.0..0V 00040.) ..0. .. ..r....0.a..09. 00 ..4. .. 3.4!. Q. .000.00m..400..’50.0.03.‘0.-O'0Z"z000 {.0 .0- 4. - 4 K. .V. ...- . . . . o . 1.0.... . .0. 0 ‘. no 00.0.0‘01154'I507000§}‘\00.V“‘Ig' ..V . w..4..0 ...... . .4. 4 .0041 7:23 44. . . .V. .49. .... 6.. . 0 . 0 ... :3. .0.‘ 10.04:.1 49¢..o‘411o04‘00.4. 0.0,. . - . .. ... I . . c f 4 4.. . 4 .. . . . . . 3.210....1: . :2... ..L. . . .0 . .0 .009. 4... I 00. 0 243.400. .0. .0340 .1400 ’0- ‘. ’0 - .. 40 4 .k 4 4 .. l4 '4 .. .40..o0.0.0 .V ......4. .44. c 4‘ ’0 .‘9 . 4 . 0003 .4Ql...II0.II. 00’s '0 0 4 (0.0 0.04 ... 4 . .. . .. 1. . o . p .. y 4. . . ... 3.0.. ...... 7 . .. ..l... . . . . . . . . £443.13. . 0 ... . .0 0‘0 20.x...000; 430.4 0.9.po "ll. ... ‘6. r 0.. .. u... v _ . 4 - ...I..... I . ........4.......440I4 ...4. .. . . . . .. ...oV ... . .. .... .40....3. 99.... '1 0.001000 VO!~4\C00.$|¢"H o . ._ .0.. . 0 o .5 I ..- O . . '0 . . v.0; . ...00 0 . D u . . . ..l ‘. 0 30.0. . . V . . n, V. . . . ... . . . . 9 0.. 0 u .. _. 5‘.o . 0 . ..0 4 0 0 0. ,0 . ‘of‘L 4 .I’... tt30’l.‘.0 O 00 ‘ 4 .4 ... . .... .4“ . 4.41. .. . |.4 . 3.2.5.6....tffi. ._ ..t , 0 ... 2 ,. .... ._ . . .. ..4...4.\ .... . o .....2 .40... .0 9090.0... .o........Y...!.v..0!.....0.0.0.3.0 \ 4‘0. .... . 4. ...... 4 .. ....40. ... 4 . . , . . .. .11... .. ...O.......44.. ...»: 430.20... . V . _ . . . . . . . 2... . 4 .0... . 4. 0.. . . . 0.. 0.1 .. . 0.03.103... . .4...‘ .0‘. . ...: o u 2.0. 0.0.0 ‘04.. . I ., l 4 4-. 4 . 9 c .4. 4.060,! ..24 0|..V4 .. 5.04. o . .. . . . 0 ... . . . .4 '0 ... .. o o 4 . a . . . . .. 0. . ... .01 ‘00.!c...h.0 0 IV' 4 . 0.... ‘0 01.0.“... v”... . ...-40. '44.- ‘.I. .44.. .o . 4. . I 7 4 -...94. .30... .. .....4. 00... .1. .. 0‘ .0409... : ... f .... ...! 0-. .3. .. .... ...... 4!. . . Q . .... 1.2.1.0... 1.. 0’0. .0-‘1..o.l..0 00.1007 l. .0.’0.0.‘000' . . . .VC. . I ...0. . .0. . .. ... Q04... 3 . 40.. 0.0000: ~v 0.. . 10..0 .0-I0....I . . . ...0. 4.0 .0 . 04 .0...04\. 4|. ... 0.0.0.0...-V. 4 . . A .40.:1000‘.’ 0C, ... . . . .....ro . ...-4 . . . .. . . . 0.0 . 1...... . . ,4... 0 ... ... . . . 4.. ..0.... . . .... o ...l. 03.00 .....490...‘ . .. .. 0.0.6...:.§0 r4V , . ., ... H o “.44.- .4. . . . . .0- ... ...... 4.... v.0 4...: ... . ...I . ......10..... 040...-.. \.00.v...,! .4...-J.0.‘4.40... .. ... . 040 ‘3 .. .4 . I . _ .. .4 . . . . . . ....V V . I 040.... z . .. .. o 0... I. . 0 0.0 044.. 0: ...}... 0 . 00. bsp‘IuV . 0 I C ’05 4 . . ... ... .. . .. ... .. . . . . .404..:.0.. 0... ..9 4 .. .400,‘.' .41.- .... . 14.0 I '3... 00 . 004....40..l.v. .. 4 1 .‘Il. .... . . . . . ..0. s ... . ... .0 0. .00. o ..... ...... 0 c .4...0 .10.. ... . .. .....0...r..u .§ 044.00000.0II\ 4. 4 0' 0 . . . . 000 0 v .4. r. ..I... I . 0 . . 0. 4’4 .10 4 . 0 . . ... 00 4 . 000.. o. ... 0.0 4 . .0 0.....- 4.. 4 O4 .0 ‘4 04." . . . 0-. . . . V. ..4' . . .0. .o _ ... 4 I40 . .4 4 0. 0.0. . v ..V . 0 0 '0 0.0 ...... V5 4‘ .. 0 Q 5. 0.4 '4 . .0 . 0 . . , . . .. .. ...4 Q, .- o _ . ~ ~ 04 40. .0 000 n . 0. .. . . . $05.09.. ‘70 £44.44 3 .. 4 . . .. ... so . 0. . ... . 00 0. 0. 0 . . . . . . 0. 0.0 0.0 10.. 4 4 0 O . o c 0 .V0I 2.40 0 It . ..II ... I I N . . ,. 04.04... ....ol.. 0 4 .0 40... . .. . .l?.-0. .. ....I.0.00.0 .. .. I .. ............... . .....0 0.v.0.... .. ...... 0......«00. ...-«0......4...... I... . . 4 m . .. ..0, . .. I... 0. 4... ..0 4..04.I.0.0..... .....0.. . 0.. . - ..u. 9. .4. t 4.9. .0 .... o . . .. . v.0 4.0. ... .000...'0.04 940010.... 140.!“4 . ....4. d 0 _ .00 . u .. .0. ... . 4 4 40 .....0 0. .404 4.4.000 'Ia..0§ '4..... '4 4 ..0 . . . a. .00.... .040. .....l000.0.‘04.0..00. 0» .4 4 s . . v 4. 0 4 ...0014.. ... 40., #00. 0'1.0'.Iw- .. O .. ...-.V, : . .. 0 . ... . .0 . . 4 ......o-....04.......!...4 4....001...0 0 I. . 4 4 I... . . . . ...-0 .... {...-4.. . 4. 4 0... . . .- .. 0 0.. I . . . a V. n v Q .40.. . 4 . 4 0 . . ....1.‘...’0..0.. . .5: . . . .... .0 4 o .1. .. . .... O 4 0' 0...I.,v000 -0 - 0.....0‘ - 4 4 I I. :0}... ...! ... . A... 5.40 4. . .....- . 40. 0 o 0 .V.. .4... . . .0 ...—.... ......00. 0 44400. ... 400- 4 4 .4 4 . ‘4 0 ‘00—. £2... ... I. .4 . . O: 4 . VI. ..0 . .04 4 7 a. .44 . . . 4 .0 . . . ... .4 .. 0 . 04 .... . . ...0 . . . 0 .2.... 0.40017 ... .. «70¢..I..- ‘ ...: . ... 0'. 4 I I! I . _ . ... 4' . .. 0.. 0 0 0.40. .0 . ..‘4 .0 '0 q, u. .0 . 0‘ . ... . ... ... 0.0“. ..\ .. 40 . . 4 . c I 4 '0 . . 0 0. . 41.- 0 . . 0 P04. p . . I - 4 .I .... 0...? 1.000.. .. .. o. I. :44» . 44. 0 . 0...0 .. 0.. . . u .. 0. o. ... . I . 29.9.4.0...02. ..A .9 . 4000-00. _ .. . . . . 00>. . 000.104 0.. . #0400004... 04 . . .. I l .000. 0 . o 4: 4 . . . o. 00.. .... .. .. 0...0 0 ... .4. _. . . 4 4“ 0.0.0 0‘ 0 0 . .. 0 0.. 0. . . .20 4|. . .0. . . vv.0.... ... ... 4 .0 '4 ... .4 . . ...... . ..lu. . ..0.. .. 0...... ...-0 004.. 45.93.000.57... 0 .l . 3 .C " . ... ...: 00.0.4 0 ..0. .4. ..4 . 4 .. 40 .. ...-4 0... .v .. ..0.. ..03 £4.43. 4 4 4.. 4 0r” . 6.000.. 0. .44 . ... . 0... ... a- A .00 ...- ~ . . .... ' ..40 .00.. . . .4 I... ..0.... 4V .0 . 0 o .. 4“ ..0c. ......vol.__.0. .0.‘ o . ... .... . 0 40 0... o .. ... . 0 0 .. O. .3... . .00. ...0 0 04 .. .‘..00000§1.0..o .' 0 u . .v' ‘ 44A 2.0.. it...) 4 .05.. .. . 1 .4I .. O '00 O - -0 ‘0.’_. 0.0 00......4. 44 .0. . .44 4 04 4. O 4.... . .4. . 4V3 . I V. .. . . ....4 .. 1 .0. .. ... .....u u...‘ .4. 40.040 4. ... ...... . . .. -..... . ... .0. .o 4 04. v . ...9 ... ...40‘ \ 4| 4 .. . . . ... . . no... 0. . . I. .0 O 0.0 o. ‘00. D . . 4 4 ... 000 4f. ...-0 4:.00. c 44 . 4 .. . . 54... 0: .0: .... . ..0 . ... .4... ..40! 40. . . .0 . 0. . I . 4 402.000-00.000 . v.05. . . . . .10.-.. 0.0.4000 . 0 04 . 0 . . 4'0 40 ... 0. o. . L 2.4. O 00. ..00 o. .040 04 4 . . .00 0 o . . v 0 0. . .4... . . 4 . .. .4... . V 3 . ... 010.. .. . .. 3..“ .49. . . 0 ... . . q. .. . . ... 0 . ...: . 4.. 4 ...4. 4 . .4 . . .... . . . .-.-(.....- .....0. . to. .... L 0 .4 ... I. .40 ..4‘. 4. .. ..4. ... 4...... . 4... .. . 4. 004 .4100: . . . . ... .4 ... . . 7.... . ...... . . . . . .. .‘3: - . . . . 4 . . .. 0 . o. . .. I 4 . 0. . .4 4.40 04‘ .0 I. .4 ..0 .304. . . 0.0-4... ... . -0 . ...40 4. . 4 .00.. ..0 904 .900 o I 0 .. -\' ‘.04.|I...... 00. . 4940 00’. .. g. 00.. 4. ... . n. . . 40. 0O . 4 . 4. . 4 s . 444V. .0 0 I D...ul .. .l b. 0.0 . 4 .V. 304... ... 000... 0. .. 4 . .. ... s . 0.04 I . u ..0. .004 o n. 0 A0. 0 0......moooao I 4 0‘0 0.000 . 00.. . .u. 4 t ... . . ...-.0 ... . .. o.... .04 . . 4' .... .. 4.00 - . ... o. .0 4 ... 4 .9 0.... 4 .. 00 9 . 4 4 .. . .. .. on. . .. -.l .4 .0.0.. o . 0. 04 O 0 u. . .... .0 . . x .0. .0 . A. x 0 . 0. . ..0.. ' .V0 ......4 0.0? . .4... 0... o 0. ...V.. ...... 4......4. .. .00.... . .. o .4 . ... _. . I 20.... o. .4.4 .444. 4 . ..I .. ....00 c. .. u x . .4. 0 I. . v. . ... ..0 \ . . . . . . 4 4 .. .c 0 04 «0. 0.0 30.04. ... . . 0- 0 oil 04.--: a. .... 4 4 .. . .40... 0. I .4 4 ... 0 .. . 4.4... 0. 0 0.4... I .... 4 4| 4. o '0 . .4 .10 A. I: .0 4 o. 0... V 0.. 0 4 ... .. 9 4. I. v . G 0 no 0.... 4..040,0. ... o . .0- . I. 4 0. ..0 0.! . I 4004. o I. . .. 0. 04 .I. 044 . . . 4 0 I 740.4 4 . 0 . s . v . . . o 0.9 . 4 ... . . .. .0 4 .4. . . - 0 . . . I . . . .44.. _ . 4 ...4. ..04 ...... 4 a 2 .4... 4i ... .4 .. .00. . 140- 4.. . ... .. u 2 . ...» .0. s. 4 . . 0 0..4 4- 0 0 ..‘V s. .0 0 ... . .. .. 0.0 . n I - .0 . . 0 4.. I. . 4 . ..0. 0.. up... .. I .000. .4... 04.... 0... 0 ...- 00.0 '4 . 4 ... .0. ..0. v 4 . 4.0. 44 0......... 0!.00...4. ... .4 0 4 o .-..04.. 4. .44440.04 . - 40 1. .100. .... 00 ... .. 0.00 I . 0. O u. 4 . .4 o 00 \4 .. 04. 400.0..00. ..0Q-‘0... . 4...! 4.0.4.1.... 44. 1ch ...: . . 0.... ... ...6 4 .04. 4 . .. ..' ..0.. . . .0 .. .30.. lo. .0 4 . ... 4 .04. ...0.00c..00‘; .0 \0..QO44 . .... .4 v. . 0. .. 4 . .4 ..v. . 0'..C000\4. O 0 I . . 2...... 4 .44 4 . .... 40 .0 ...-.0. . 4| .0 .0..0..0 ..‘3002~4 0 0..0\. 0 o . . . .....00.. . 0 . 0 I .41 ... 40 ... u 0.‘ 0. .0. 0 0‘0 40.0. 4.0 '9.I..\4.. 0 9 00.70.40 0 . . 4 o 0 4 .o.~\4 .0. I... ..l. . 0 I 4.0 'a 4.V 0 04 I. . u . I 4 . ..0. 0 o 0 . . 4... .... .. 0.... . . 4 .. 0 . u. o. v 044.4..‘0... . . u 4 0‘ 4... 0 . ...v . I. I .0 00.. . ..0 0 .0 4| .90..- Dirt. 4 . . 4. 9 .. -..00.. .004 0. 0.000 ..4- . 4 4. v ’4 44 .0 I 4 t I I.v.0.- ‘..0‘ ‘ . 0 440 .400 404. . - 0 I0. . -00 .. _ 0 . . . O 4 u ' . w 0 .OVRM 40! 0. - 4 0 ...! l. v .. V "Pup L1! .9 90...... ...l..‘(.v. .4200. V ..r I.. 0 1| 0. .4...000. I. 0.00.0. V... 1.00.. :0. . . . 0 . V . . . . .. .V ... ... , . .... . . .4 . . .... . . - . .. . u. ...? “mun... . .zfi.u~§.4.fi..u.5..2. . n. . ...: 1...... ,_ . ...,..0..0 .0 .10 . 00...». 24-00.. ... 04 .00 . a- ~ 4) . . . . . A _ . ... 4 . . . I. 4.0! . o .0 . 0 .0”): .93! 00.. "3.403.. ”P. 0. ”I.“ “W4 .u. J. U‘..‘. . 40.400-‘ 4 . 04 .. . ,0... 0. o ...... Itmai ..9 1...... .....230’03 4.4400 4 1?. 0... .. o 4 | . 4 4 71.4.0.4...1. 0.. 4 .‘UIZ ... ..Q '40. o .. .0. . 0.0.00'. .0. I . in...“..4~»vl¢. . . .. 0'4.‘.0..¢A0.'QH~.003’..40‘4.¢J0 . 0. W ....40.‘. o . 1.1.3....35.....u..¢i=.§....flfi.u .. a... .. .. I -nll u.» Hahn-ta”! .i...‘ ....u p . .0: . . ..0... -380: . I 4.0.4.; 9‘0 o Ni LOIS “LIBRARY MIChIgan State University This is to certify that the dissertation entitled VARIANCE COMPONENT MODELS IN MAPPING IMPRINTED GENES: STATISTICAL THEORY AND APPLICATIONS presented by Gengxin Li has been accepted towards fulfillment of the requirements for the PhD. degree in Statistics Major Professor’s Signature 5/) 9 /?, C2 / 0 Date MSU is an Affirmative Action/Equal Opportunity Employer PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 KlProj/Aoc&Pres/CIRC/DateDue.rndd VARIANCE COMPONENT MODELS IN MAPPING IMPRINTED GENES: STATISTICAL THEORY AND APPLICATIONS By Gengxin Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics 2010 ABSTRACT VARIANCE COMPONENT MODELS IN MAPPING IMPRINTED GENES: STATISTICAL THEORY AND APPLICATIONS By Gengxin Li Genomic imprinting has been thought to play an important role in seed development in flowering plants. Seed in a flowering plant normally contains diploid embryo and triploid endosperm. Empirical studies have shown that some economically impor- tant endosperm traits are genetically controlled by imprinted genes. However, the exact number and location of imprinted genes are largely unknown due to the lack of efficient statistical mapping methods. When an iQTL segregates in experimental line crosses, combining different line crosses with similar genetic background can im- prove the accuracy of iQTLs inference. To make full use of the natural information of sex—specific allelic sharing among sibpairs in line crosses, general statistical vari- ance components frameworks are proposed to map imprinted quantitative trait loci (iQTL) for the diploid tissue and the triploid tissue, individually. Considering the special characteristics of the diploid embryo genome and triploid endosperm genome, new variance components partition methods with respect to the diploid and triploid tissues are developed. An extension to multiple QTL analysis is proposed for both diploid and triploid tissues. A number of studies have demonstrated that multivariate traits analysis can pro- vide more significant power and higher resolution for major gene detection in linkage analysis (Evans 2002). Furthermore, when a QTL has the pleiotropic effect on several traits, some important biologically interesting hypotheses can be performed success- fully under the multivariate traits approach. It is well known that several highly correlated traits appear commonly in endosperm. So the variance components based univariate trait iQTL model is extended to bivariate traits iQTL model for mapping the parent—of-origin effect. It may expedite the process of identifying and eventually cloning genes controlling important endosperm traits. Except for the wide application of variance components model in flowering plants, variance components analysis has been a standard means in human genetics. In brief, the genetic effect is detected by the significance of the likelihood ratio test. However, true parameters of main interest may be on the boundary of the parameter space under the null hypothesis, thus the regularity condition for declaring asymptotic chi- square distribution of the LRT statistics is not satisfied. The threshold calculation based on current methods often yields conservative hypothesis tests as discussed in a number of studies, especially in multivariate traits cases. To solve this problem, a general approximation form of the LRT under the null hypothesis of no linkage is pro- posed, and the chi-square mixture proportions are shown to depend on the estimated Fisher information matrix in both univariate and multivariate trait analysis. COPYRIGHT Copyright by Gengxin Li 2010 ACKNOWLEDGMENT My academic life at Michigan State University is rewarding under the guidance of my advisor Dr.Yuehua Cui. I would like to express my sincere appreciation to Dr. Yuehua Cui for his valuable advices and encouragements during my PhD study at the department of statistics and probability. He introduces a great research area, statistical genetics, to me and makes my research study be full of pleasure. I am also grateful to be awarded two Quantitative Biology fellowships under his strong support. His outstanding performance on both research and teaching areas serves as a perfect example for pursuing my future career goal. I sincerely thank my co—advisor Dr. Dechun Wang for making me participate in the interdisciplinary research at the department of crop and soil sciences. Dr. Dechun Wang is a professional geneticist in soybean. He provides strong helps for my application in QB interdisciplinary Ph. D program, and supervises me for the biology training in his lab. My academic training in interdisciplinary researches is further broaden under Dr. Dechun Wang’s guidance. I also express my gratitude to Dr. Connie Page, Dr. Sarat Dass, and Dr. Robert Tempelman for being my Ph.D committee members. I appreciate your suggestions for my dissertation. Special thanks to my husband, Shengqi Yao and my parents, Junru Li and Jiarong Fu. TABLE OF CONTENTS List of Tables ................................. viii List of Figures ................................ x 0.1 Introduction ................................ 1 0.1.1 Gene and quantitative trait loci (QTL) ............. 1 0.1.2 Genomic imprinting ........................ 3 0.1.3 Imprinting QTL method ..................... 4 0.1.4 Objectives and organization of the dissertation ......... 7 A statistical variance components framework for mapping imprinted quantitative trait loci in experimental crosses .......... 9 1.1 Introduction ................................ 9 1.2 Statistical Methods ............................ 13 1.2.1 Genetic Design .......................... 13 1.2.2 The mixed-effect variance components model .......... 16 1.2.3 Parent-specific allele sharing and covariances between two in- breeding full-sibs ......................... 19 1.2.4 Likelihood function and parameter estimation ......... 25 1.2.4.1 The ML estimation ................... 26 1.2.4.2 The REML Estimation ................. 30 1.2.5 QTL IBD sharing and genomewide linkage scan ........ 32 1.2.6 Hypothesis testing ........................ 34 1.2.7 Multiple QTL model ....................... 37 1.3 Results ................................... 41 1.3.1 Simulation design ......................... 41 1.3.2 Simulation results ......................... 44 1.3.2.1 Single QTL analysis .................. 44 1.3.2.2 Multiple QTL analysis ................. 52 1.4 Discussion ................................. 56 A general statistical framework for dissecting parent-of-origin effects underlying endosperm traits in flowering plants ......... 61 2.1 INTRODUCTION ............................ 61 2.2 STATISTICAL METHOD ........................ 64 2.2.1 The genetic design ........................ 64 2.2.2 The model ............................. 67 2.2.3 Parent-specific allele sharing and the covariance between two inbreeding sibs .......................... 68 vi 2.2.4 QTL IBD sharing and genome-wide linkage scan ........ 74 2.2.5 Hypothesis testing ........................ 75 2.2.6 Multiple iQTL model ....................... 78 2.3 SIMULATION .............................. 80 2.3.1 Single iQTL simulation ...................... 80 2.3.2 Multiple iQTL simulation .................... 85 2.4 A CASE STUDY ............................. 87 2.5 DISCUSSION ............................... 96 3 Bivariate quantitative trait linkage analysis in mapping imprinted quantitative trait loci underlying endosperm traits in flowering plant 104 3.1 Introduction ................................ 104 3.2 Statistical method ............................ 108 3.2.1 The model ............................. 108 3.2.2 Parent-specific allele sharing & genomewide linkage scan . . . 110 3.2.3 Likelihood function and parameter estimation ......... 111 3.2.3.1 The ML estimation ................... 112 3.2.3.2 The REML Estimation ................. 115 3.2.4 Hypothesis testing ........................ 119 3.3 Simulation ................................. 123 3.3.1 Simulation design ......................... 123 3.3.2 Results ............................... 124 3.4 Real Data Analysis ............................ 128 3.5 Discussion ................................. 131 4 Assessing statistical significance in genetic linkage analysis with the variance components model ................... 134 4.1 Introduction ................................ 134 4.2 Motivating models ............................ 136 4.2.1 Model I .............................. 136 4.2.2 Model II .............................. 138 4.2.3 Model III ............................. 139 4.3 Main results ................................ 141 4.4 Simulation ................................. 162 4.5 Conclusion ................................. 162 5 Concluding remarks ....................... 167 vii 1.1 1.2 1.3 1.4 1.5 2.1 2.2 LIST OF TABLES The IBD sharing coefficients for full-sib pairs in a reciprocal backcross design considering allelic parental origin ................ The power, MLEs and REMLS of the QTL position and effect param- eters estimated based on 100 simulation replicates for a QTL with no imprinting effect under different sampling designs ............ The power, MLEs and REMLS of the QTL position and effect param- eters estimated based on 100 simulation replicates for a QTL showing different imprinting effects under the 20x20 sampling design. The square roots of the mean squared errors of the parameters are given in parentheses. ............................... The power, MLEs and REMLS of the QTL position and effect param- eters estimated based on 100 simulation replicates for data simulated with Mendelian and imprinting models based on the 20x20 sampling design. .................................. The MLEs and REMLS of the QTL position and effect parameters estimated based on 100 simulation replicates for data simulated with two QTLS under the 20x20 design. The square roots of the mean squared errors are given in parentheses. ................ The allelic-specific IBD sharing coefficients for full-sib pairs in a recip- rocal backcross design. .......................... The power, REMLS of the QTL position and effect parameters esti- mated based on 100 simulation replicates for a QTL with no imprinting effect under different sampling designs. The square roots of the mean squared errors of the parameter estimates are given in parentheses. viii 43 66 81 2.3 2.4 3.1 3.2 3.3 4.1 4.2 4.3 The power, REMLS of the QTL position and effect parameters esti- mated based on 100 simulation replicates for a QTL showing different imprinting effects under the 20x20 sampling design. The square roots of the mean squared errors of the parameters are given in parentheses. The estimated parameters for the three maternal effects and the vari— ance components for two endosperm traits: mean ploidy (Mploidy) and percent of the endoreduplicated nuclei (Endo). .......... The power, REMLS of the QTL position estimated based on 100 sim- ulation replicates for a QTL with no imprinting effect under 4 x 100 design. The square roots of the mean squared errors of the parameter estimates are given in parentheses. ................... The power, REMLS of the QTL position and effect parameters esti- mated based on 100 simulation replicates for a QTL showing different imprinting effects under the 4x100 sampling design. The square roots of the mean squared errors of the parameters are given in parentheses. The estimated parameters for the variance components for joint bi- variate endosperm traits: mean ploidy (Mploidy) and percent of the endoreduplicated nuclei (Endo). ..................... The whole possible subsets of $12le without covariance terms. . . . The whole possible subsets of $12le with covariance terms ...... Comparisons of the performance of different approximation methods based on 1000 simulation replicates under different models ....... ix 84 91 127 130 147 149 163 1.1 1.2 1.3 2.1 2.2 LIST OF FIGURES Possible alleles shared IBD for individuals 2' and j in inbreeding back- cross families. Lines indicate alleles shared IBD. ............ The LR profile plot. The left and right figures correspond to the LR profiles generated using the ML and REML method, respectively. The arrow indicates the true QTL position. ................. The LR profile plot for singe QTL and multiple QTL analysis. The true QTL positions are simulated at 28cM and 68cM (see the arrow sign). The dotted curve and the solid curve represent the LR profiles by single QTL and multiple QTL analysis, respectively. The left and right figures correspond to the LR profiles generated using the ML and REML method, respectively. ...................... Possible alleles shared IBD for individuals 2' and j in inbreeding back- cross families. The solid lines indicate IBD sharing for alleles inherited from the same parent. The dotted lines indicate IBD cross-sharing for alleles inherited from different parents. ................. The LR profile plot for the single iQTL and multiple iQTL analyses. The true iQTL positions are simulated at 28cM and 720M (see the arrow signs). The dotted and the solid curves represent the LR profiles by the single iQTL and multiple iQTL analyses, respectively. ................. X 21 46 70 2.3 2.4 2.5 3.1 4.1 4.2 4.3 The profile of the log-likelihood ratios (LR) for testing the existence of QTLS underlying the two endosperm traits across the 10 maize linkage groups (G1 , - - - ,Glo). The genome-wide LR profiles for the percentage of endoreduplication (Endo) and mean ploidy (Mploidy) traits are indi- cated by solid and dotted curves, respectively. The threshold values for claiming the existence of QTLS are given as the horizonal solid and dot- ted line for the genome-wide threshold, dashed and dash-dotted line for the chromosome-wide threshold, for the two traits Endo and Mploidy, respectively. The genomic positions corresponding to the peak of the curves that pass the corresponding thresholds are the MLEs of the QTL location. The positions of markers on the linkage groups (Coelho et a1. 2007) are indicated at ticks ..................... The profile of the log-likelihood ratios (LR) for testing the existence of QTLS underlying the trait Mean Ploidy Level across the 10 maize linkage groups (G1, - ~ ,GlO). ...................... The profile of LR values for testing the existence of QTLs underlying the trait Percentage of Endoreduplication across the 10 maize linkage 88 94 groups (G1, - - - ,GlO). See Figure 2.4 for more explanations of the figure. 95 The profile of the log-likelihood ratios (LR) for testing the existence of QTLs underlying the two endosperm traits across the 10 maize linkage groups (G1, - -- vGIOI- .......................... The quantile plot of the empirical p—values for Model I. For the legend: Self & Liang refers to SF; Proposed refers to the current method. See Table 4.3 for more explanation of the legend. ............. The quantile plot of the empirical p-values for Model II. For the legend: Self & Liang refers to SF; Proposed refers to the current method. See Table 4.3 for more explanation of the legend. ............. The quantile plot of the empirical p—values for Model III. For the leg- end: Self & Liang refers to SF; Proposed refers to the current method. See Table 4.3 for more explanation of the legend. ........... xi 129 0.1 Introduction 0.1.1 Gene and quantitative trait loci (QTL) Gregor Mendel first studied certain genetic traits to discover the inheritance of bi- ological variations in peas. A gene is responsible for inheriting these special traits from parents. With the exploration of the DNA structure, a gene is normally defined as a stretch of DNA that acts on the protein or an RNA chain to issue instructions for a special function. For example, DMPK gene can produce a unique protein, my- otonic dystrophy protein kinase, to guarantee the normal function of muscle, heart, and brain cells. There are around 30,000 protein-coding genes in human that work together to control most functions in human body. An allele is a copy of a gene that measures the variation of the DNA sequence. Usually, a gene A is made up of two alleles A and a. Three genetic compositions (AA, Aa, and aa) made up of two alleles (A and a) are defined as genotypes. In fact, humans share mostly the same genes with distinct combinations of alleles that make him or her genetically unique. For instance, the hair color is controlled by same genes in human, but the specific hair color, such as: red, black, blonde, and so on, is determined by different alleles combined in the same genes. Besides, genes may affect many important quantitative traits, for instance, body weight, body height, blood pressure, and so on. Inheritance of characteristics of quantitative traits is attributed to single gene or multiple genes interacting with environmental factors. Thus, quantitative trait loci (QTL) is detectable regions of the genome that are closely linked to genes associated 1 Tn LC with variations of quantitative traits. The association between quantitative trait loci and closely linked genes in the same chromosome is termed the genetic linkage. The recombination fraction measures the degree of this association and is utilized to create a genetic linkage map. In brief, the recombination is a process through which a chromosomal crossover happens between two QTL or genes during the meiosis. The mean number of crossovers is called map distance, such as: one centimorgan (CM) is equivalent to a recombination fraction of 1%. Because of unobservability of the QTL genotype, the closely linked neutral molecular markers is used to predict the genotype of QTL. A genetic marker is a DNA sequence that is the unit component of one chromosome. Associated with a certain locus, genetic markers are easily identifiable and highly polymorphic. Their exact locations on a chromosome can be estimated. In fact, a statistical model is built to connect the QTL genotypes and marker genotypes through phenotypes to identify and sequence genes. So far, scientists have identified more than 10,000 mouse genes. Because mice and humans share around 95 percent identical sequence and possess same organs, more than 500 mouse models with respect to human diseases including cancer and diabetes have been developed. Many successfully developed gene techniques in mice have allowed scientists to investigate the human disease on animal models. More and more people recognize that the age of genetic medicine begins. 0.1. ltL~ of tl 0.1.2 Genomic imprinting It is well-known that two alleles of a gene inherited from both parents affect variations of the DNA sequence jointly. If only one parental derived allele is associated with the variation of phenotype, and the other allele is unexpressed, this special epigenetic phenomena (uniparental gene expression) is termed genomic imprinting (Wolf et al. 2008). Under genomic imprinting, the expression of the same allele A from different heterozygote genotypes Aa and (LA depends on the origin of inheritance of this allele. Then the maternally derived allele A (from Aa) functions differently from that of paternally derived allele A (inherited from aA). There are two types of imprinted genes, that is, one gene is maternally imprinted when the paternal copy is expressed with silent maternal copy, and a gene is paternally imprinted gene if the maternal copy is expressed with silent paternal copy. Genomic imprinting is first used to describe the elimination of paternal chromosome for spermatogenesis in sciarid flies. With the investigation of genomic imprinting, scientists have found that the DNA methylation and histone modifications are the vital mechanism to result in imprinting (Feil and Berger 2007). During this mechanism, imprinted genes are expressed differently in egg and sperm, and the different gene expression is caused by the inheritance of these epigenetic phenomena. In the healthy genome, even a mutation happens on one allele of a gene, the other allele can still be transcribed to pay off the loss from the mutation. But, if the epigenetic event takes place on the same gene, only the mutated allele is expressed, then people get a disease because of the imprinting effect. Thus, the epigenetic changes are serious to the disease without changing the genomic 3 sequences physically. In the past few years, scientists have made a lot of efforts in the understanding of genomic imprinting. Specifically, the significant phenotypic variations caused by imprinted genes have been confirmed in areas of the fetal growth and behavior. It has been increasingly recognized that imprinted genes may influence cancer, obesity, diabetes and many other disease in human and mammal, and many imprinted genes are identified to regulate embryonic development in plants. For example, Prader—Willi Syndrome, a genetic disease, makes patients to be extremely fat. It is caused by the deletion of 7 genes on the paternal chromosome 15 where the maternal copy is silent. Besides, other severe genetic diseases caused by the imprinting effect are Embryonal rhabdomyosarcoma for kidney cancer, Osteosarcoma for bone cancer, and Angelman syndrome for delayed development, and so on. In maize endosperm, imprinted genes are thought to control the endoreduplication (Dilkes et al. 2002) procedure through which larger fruits or seeds are obtained (Grime and Mowforth 1982). The disrupted gene (IGF2) encoding paternally transmitted insulin-like growth factor II results in growth deficiency in mice (DeChiara et al. 1991). Currently more than 600 imprinted genes have been predicted in mouse genome (Luedi et al. 2005). But the accurate locations and the genetic effect of most imprinted genes remain largely unknown. 0.1.3 Imprinting QTL method From a quantitative genetic theory point of View, imprinting results in genetic gain and evolutionarily favorable. Considering a gene A with two alleles A and a, the allele 4 frequency of A is p, and for a is q. Because of genomic imprinting, heterozygotes Aa and aA are expressed differently, then distinct genotypic values can be defined by the additional imprinting effect 2' When 2' = 0, the model is reduced to the traditional Genotype Frequency value AA p2 a Aa pq d + z' aA pq d — 2' aa q2 -a Mendelian model. Simple algebra shows that the genetic variance with and without imprinting is given as 031. = 2pqoz,i-2 + (2pqd)2 + 2pqi2, Imprinting 2 _ 2 2 . . . org — 2pqa + (2pqd) , No imprinting where a,- and a are the average effects with respect to imprinting and no—imprinting, respectively. The additional variance term 210qu due to imprinting is always non- negative. Thus, imprinting leads to increased genetic variance and is evolutionarily favorable. This explains why after so many years’ natural selection, genomic imprint- ing is still preserved. The imprinted inheritance violates the Mendelian theory and brings challenges in statistical modelling. The statistical framework in mapping imprinted genes or QTL was initiated with a fixed effect model in which the genetic effect is treated as a fixed term. Many studies under this framework were developed to test imprinted QTL 5 with controlled crosses of outbred parents (Knott et a1. 1998; de Koning et al. 2000 2002). But, the allelic heterozygosity of two outbred parents may induce confounding effects for genomic imprinting. The genetic difference based on these methods may not be explained by the real imprinting effect (Lin et al. 2003). When backcross and F2 populations with inbred lines were analyzed, the regression-based maximum likelihood approaches in mapping the imprinted QTL were proposed (Cui 2006; Cui et al. 2006, 2007). It has been shown that methods focusing on genetic variances are more powerful to infer QTL effects than the allele substitution method assuming a fixed effect (Xie et al. 1998). When an iQTL segregates in multiple line crosses, the detection of iQTL may be improved by combining different line crosses with similar genetic background. However, no studies based on the variance components method have been proposed to identify iQTLs with multiple line crosses. The variance components method is based on the identical-by-decent (IBD) prin- ciple in which sib pairs have more similar phenotypic trait values when they share more proportion of alleles IBD. Variance components model in mapping the parent- of-origin effect in human was first proposed by Hanson et a1. (2001). In this approach, the additive genetic variance is decomposed into two terms, a component due to the expression of the maternal allele and a component due to that of the paternal allele. However, the direct application of this variance components method to a fully or par- tially inbreeding population is infeasible. The structure of inbreeding populations is more complicated than that of non-inbreeding populations. Constructing a variance components method based on inbred populations is still a challenging problem. Endosperm in flowering plants is developed from the process of double fertilization, and ended up with a triploid tissue. A number of studies have shown that many endosperm traits are affected by genomic imprinting. Statistical methods based on the fixed effect model were proposed to map Mendelian QTL controlling endosperm traits (Wu et al. 2002; Xu et al. 2003; Cui et al. 2005, 2006). However, no studies are investigated for mapping imprinted QTL in endosperm inbreeding population due to the difliculty in modeling the inheritance patterns in a triploid organism with imprinting. In a collaboration with scientists, a data set has been generated for the purpose of identifying imprinted genes controlling for endosperm development. This example motivates us to develop eflicient methods while considering the unique genetic structure of a triploid tissue. 0.1.4 Objectives and organization of the dissertation In this dissertation, I will focus on developing efficient variance components models for the purpose of identifying imprinted genes in experimental crosses. Major goals of this dissertation are summarized as follows: 0 Propose a general statistical variance components framework by utilizing the natural information of sex-specific allelic sharing among sib pairs in line crosses, to map imprinted quantitative trait loci (iQTL) underlying traits in a diploid mapping population. a Extend the method to map iQTLs underlying endosperm traits. 7 o Extend the single trait model to multi—trait analysis for mapping iQTL underly- ing bivariate or highly correlated endosperm traits. New biologically interesting hypotheses, such as, testing the pleiotropic effect of (i)QTL or testing pleiotropic effect against close linkage will be designed. 0 Conduct a theoretical investigation of the likelihood ratio test (LRT) under the proposed mapping framework. The dissertation is organized as follows. Chapter 1 will illustrate the variance components based statistical mapping framework for diploid inbreeding populations. The variance components based iQTL mapping approach for the triploid endosperm will be discussed in Chapter 2. The predominance of the bivariate trait analysis will be studied in chapter 3. The asymptotic properties of the likelihood ratio test under the variance components model will be investigated in chapter 4, followed by the final concluding remarks in chapter 5. Chapter 1 A statistical variance components framework for mapping imprinted quantitative trait loci in experimental crosses 1.1 Introduction The genetic architecture of complex phenotypes in agriculture, evolution and biomedicine are generally complex involving a network of multiple genetic and environmental fac- tors that interact with one another in complicated ways (Lynch and Walsh 1998). The development of molecular markers makes it possible to identify genetic loci (i.e., quantitative trait loci or QTLs) underlie various traits of interest. Genetic designs 9 with controlled crosses are generally pursued to generate mapping populations aimed to identify QTLs underlying the variation of phenotypes. Statistical method for QTL mapping with experimental crosses dates back to the seminal work of Lander and Botstein (1989). Various extensions have been developed since then (e.g., Zeng 1994; Kao et al. 1999). For a diploid organism, the expression products of most functional regions from each one of a chromosome pair are equal. A broken of this equivalence, that is, nonequivalent genetic contribution of each parental genome to offspring phenotype, can result in genomic imprinting, a phenomenon also called parent-of-origin effect (Pfeifer 2000). Since its discovery, imprinting-like phenomena have been commonly observed in mammals and seed plants (reviewed by Burt and Trivers 2006). However, statistical methods for identifying imprinted genes have not been extensively studied and well developed. The imprinted inheritance violates the Mendelian theory and brings challenges in statistical modelling. Currently there are two frameworks in mapping imprinted genes. One is based on the random effect model with pedigree—based natural popu- lation such as humans. Hanson et al. (2001) first proposed a variance components framework by partitioning the additive variance component as two parts, a component due to maternal gene and a component due to paternal gene. The variance compo- nent method is developed based on the identical-by-decent (IBD) idea in which the expression of the gene for a pair of individuals is expected to be similar if they share alleles IBD. Liu et al. (2007) recently applied the model to map iQTL underlying ca- 10 nine hip dysplasia in a structured canine population. However, the current IBD—based variance components method for mapping imprinted genes assumes non-inbreeding population. Their applications are immediately limited with fully or partially in- breeding population such as the controlled inbreeding design in plants and animals. With inbred mapping population in humans, Abney et al. (2000) proposed a method to estimate variance components of quantitative traits. However, the extension of the method to map imprinted gene is not straightforward. No variance components method has been proposed to map imprinted genes with inbred population in the literature. Another general framework for mapping imprinted genes is based on the fixed- effect model in which the effects of genetic factors are considered as fixed. A number of studies were proposed under this framework for mapping imprinted QTL (iQTL) with controlled crosses of outbred parents (Knott et al. 1998; Koning et al. 2000; Koning et a1. 2002). One potential limitation of these methods is that allelic heterozygosity at a locus between two outbred parents could cause confounding effects for genomic imprinting. The genetic differences detected by such a fixed-effect model could be caused by allelic heterozygosity of the parents rather than the imprinted effect of iQTL (Lin et al. 2003). A natural alternative for the mapping population is the inbred lines. F ixed-effect models based on backcross and F2 population were recently proposed under the maximum likelihood framework (Cui 2007; Cui et al. 2006 2007; Li et al. 2008). When inbred lines are used, Xie et al. (1998) pointed out that it is more meaningful to inference QTL effect by its variance rather than by the 11 allele substitution effect. The QTL variance is generally calculated conditional on the cross, and it, as a variable, is different from one cross to another (Xie et al. 1998). In a single line cross the estimated QTL variance can not be simply extended to a statistical inference space beyond that (Xie et al. 1998). Multiple parental lines are needed for QTL variance inference. A solution to this is to combine data from multiple line crosses (Xie et al. 1998). An IBD-based variance component method was proposed by Xie et al. (1998) with multiple line crosses. Extension of the IBD- based variance component method with multiple line crosses to iQTL mapping has not been studied. Motivated by the limitations of current methods aforementioned and by the press- ing need for efficient iQTL mapping procedure, in this article, we propose a statistical variance components framework for iQTL mapping by combining data from multiple inbred line crosses. The proposed model is robust in iQTL variance inference by ex- tending the iQTL inference space from single line cross to multiple line crosses. A parent-specific IBD sharing partition method is proposed by considering the inbreed- ing structure in line crosses. As discussed in Cui (2007), the phenotype of an offspring is not only controlled by its own genetic profiles, but also by maternal genotype. The effect of maternal genotype on the phenotype of her offspring, termed maternal effect, is one potential source of confounding effect in the inference of genomic imprinting. The existence of such parental effect may lead to incorrect interpretations of imprint- ing when they are not properly accounted for in the analysis. Parameters that model the maternal effect are also included and adjusted when testing imprinting. 12 With the developed model, we propose an interval-based method for genomewide scan and testing of iQTL. Both maximum likelihood (ML) and restricted maximum likelihood (REML) methods are proposed and compared for parameter estimation and power analysis. An extension to multiple QTL is also proposed in which the multiple QTL model provides improved resolution for QTL inference. Extensive sim- ulations are conducted to compare the performance of the proposed model under different sampling designs with different combinations of family and offspring size. Comparisons of the ML and REML methods, single QTL and multiple QTL methods are discussed. The proposed method provides a general framework in iQTL mapping with multiple line crosses and has significant implications in real application. 1.2 Statistical Methods 1.2.1 Genetic Design The dissection of imprinting effects in line crosses depends on appropriate mating designs where the allele parental origin can be traced and distinguished. Most com- monly used inbred line crosses are the backcross, F2 and recombinant inbred line (RIL). Reciprocal backcross design has been proposed in iQTL mapping (Cui 2007; Cui et al. 2007). Considering parental origin of an allele, we use the subscripts m and f to refer an allele inherited from the maternal and paternal parents, respectively. The merit of a backcross design is that two reciprocal heterozygotes in offsprings, Ama f and amA f’ can be distinguished and their mean effects can be estimated and 13 tested to assess imprinting (Cui 2007; Cui et al. 2007). While all individuals in an F 2 segregation population share the same parental information, theoretically it is impos- sible to distinguish the phenotypic distribution of Ama f and amA f without extra information. Considering sex-specific recombination rates, Cui et al. (2006) recently developed an imprinting model by incorporating this information into an interval mapping framework. No study has been reported to use RILs for iQTL mapping. The methods proposed in Cui (2007) and Cui et al. (2007) are fixed-effects QTL models where the effects of an iQTL are considered as fixed. While only four backcross families are considered, when extending to multiple backcross families, the inference of iQTL variance calculation is less efficient. The variance components method, initially proposed in human linkage analysis (Amos 1994) offers a powerful alternative in assessing genomic imprinting (Hanson et al. 2001). In this paper, we will extend the variance components method to inbred line populations by combining different backcross lines to map iQTL. 14 m H H no we no as O has H H as o as so o as .980 2:16 $28 $89 bests \sEQ \sEs $.59 bests $80 a H H 3 so o as no boss H H so o o no so no goes siechis \sEs 39:5 \cEs \GHEC bits .4025 REES RQEG H H o no as so no o Secs H m we H so no o no $80 99x5 ROSS \OEG xOEv \QEO \GEG “H.959 xGHEe \QEO H H a so me o no so hose H a no H o no so no \QEQ sexed Hess Hess Hess Hess ass Hess Hess Hess k .QEk 55: SEE ogaoaow mmoHoxomm mm: 1308 $5me DmH oEooamIfiHmEm wEHamfiO Ewto H3553 ozoze wHHEoEmHHoo summon mmoHoxoaQ 180363 a 5 BBQ £m-=£ Hem mucomoEooHH wEHcfi DE 23. “HA oEcB 15 A typical backcross design often starts with the cross between one of the parental lines and their F1 progeny to create a segregation population. Then large number of offsprings are collected for QTL mapping. When imprinting effect is considered, reciprocal backcrosses are needed. A basic design framework is illustrated in Table 2.1 in Cui (2007). The two reciprocal backcrosses are treated as the base mapping units. Multiple backcross families are sampled based on these crosses. For simplicity, we sample equal number of families for each backcross category. For example, a sample of 8 families would require two of each of the four backcrosses. Noted that the variance components method assesses the degree of allele sharing among siblings. When it is applied to inbred line crosses, each backcross population is considered as one family and different families are considered as independent. For fixed total sample size, one issue is to assess whether we should sample large number of families each with small offspring size or small number of families each with large offspring size. For example, to sample 400 individuals, shall we sample 4 backcross families each with 100 offsprings or 100 families each with 4 progenies or other sampling strategies? The choice of optimal designs is intensively evaluated through simulations. 1.2.2 The mixed—effect variance components model Suppose there is a putative QTL with two segregating alleles Q and q, located in an interval responsible for the variation of a quantitative trait. The phenotype, yikv for individual i measured in backcross family k(= 1, - -- , K) can be written as a linear 16 function of QTL, polygene and environmental effects, yik =“+aik+Gik+eik’ k=1,... ,K; i=1,... ’nk (1.21) where n k is the number of offspring in the kth backcross family; p denotes the overall mean; aik is the random additive effect of the major monogenic QTL assuming nor- mal distribution with mean zero; Gik is the polygenic effect that reflects the effects of unlinked genes and is assumed to be normally distributed with mean zero; and eik ~ N (0, 0,2,) is the random environmental error uncorrelated to other effects. The phenotypic variance-covariance for the kth family can be expressed as, 2k = nkag + @903 + 103 (1.2.2) where 0?; and 03 are the additive and polygene variances; H k is a matrix containing the proportion of marker alleles shared IBD for individuals in the kth backcross family; ¢I>g is a matrix of the expected proportion of alleles shared IBD, and I is the identity matrix. The calculation of the IBD sharing matrix with inbred lines can be found in Xie et al. (1998) which is based on the Malécot’s coefficient of coancestry (Malécot 1948). Noted that a backcross offspring with genotype qu f may be obtained by the QQ x Qq or the Qq x qq cross. When there is a significant maternal effect, the mean expression for genotype qu f may be different depending on whether its maternal parents carrying QQ or Qq genotype. As described in Cui (2007), maternal effect 17 is one source of potential confounding factor for genomic imprinting. It should be appropriately modeled and adjusted when testing imprinting. Here, we model the cytoplasmic maternal effects as fixed effects, and the overall mean [i is replaced by “k which models the maternal effect of the kth distinct backcross family. To accommodate parent-of—origin effects, the QTL additive effect (a) can be par- titioned as two terms: (1) a component that reflects the influence of the QTL carried on the maternally derived chromosome (am); and (2) a component that reflects the influence of the QTL carried on the paternally derived chromosome (of). The model that accommodates the parent-specific effects can be expressed as, yik =Hk+aikm+aikf+Gik+eiki k=1,--- ,K; i=1,--- ,nk For data vector y in family It, the above model can be re—expressed as, yk=XkB+akm+akf+Gk+ek, k=1,---,K (1.2.3) where X k is an indicator matrix corresponding to the kth backcross family and [3 con- tains parameters associated with the three maternal effects; akm ~ N (0, Hml k072n)’ akf ~ N(o,rrf'ka}), Gk ~ N(0,gag), ek ~ N(0,Ia§), where nmlk and Hflk are matrices containing the proportion of marker alleles shared IBD that are derived from the mother and father, respectively; (P9 is a matrix of the expected proportion of alleles shared IBD, and I is the identity matrix; 0,271 and a; are the variance of alleles inherited from the maternal and paternal parents, respectively. 18 With non-inbreeding mapping population, Hanson et al. (2001) expressed the phenotypic variance-covariance for the kth family as, 2 2 2 2 2k = nmlkam + Hflkaf + {>909 + 106 (1.2.4) However, for an inbred mapping population, this IBD-based variance partition method can not be directly applied. New method considering the inbreeding structure is needed. 1.2.3 Parent-specific allele sharing and covariances between two inbreeding full-sibs Before we get the phenotypic variance—covariance of a pair of individuals i and j , let us first consider the parent-specific allele sharing status. Within each BC family, there are two alleles segregating at each locus. Because of inbreeding, the IBD values between two backcross individuals are different from those calculated from outbred full-sibs. Consider two sibs i and j in the kth backcross family. Without considering allelic parental origin, Xie et al. (1998) proposed to calculate the IBD value at a QTL as, 2ft)? QQ—QQ 7f” :29” = (1.2.5) 2] 2] lforQQ-qurQq—Qq with 0733- being the Malécot’s coefficient of coancestry (Malécot 1948). Thus, for an inbred population, 7r,- j is not the actual IBD value between individuals i and j, rather 19 interpreted as twice the coefficient of coancestry (Xie et al. 1998; Harris 1964). For individuals with itself, 2 for QQ-QQ 7T"=I+Fz'= (1.2.6) 1 for Qq - Qq where F;- is the inbreeding coefficient for individual i at the QTL. The elements in (Pg matrix are just the expected values of n,- j and ”ii which are (15,; j=5/ 4 and ¢,,- = 3/ 2 (Xie et al. 1998). 20 .Qm: USER mofloza 35:2: moss .8583 mmoHoxoen wcmpoofifi E .m was H. mEHHEfipH: Ho“ Qmm madam mofloza oEHmmom ”HA 8:th .N 21 When allelic parental origin is considered, the IBD sharing matrix can also be calculated based on the coefficient of coancestry. By definition, the coefficient of coancestry is defined as the probability that two randomly drawn alleles from indi- viduals i and j are identical by descent. Fig. 1.1 displays possible alleles shared IBD for sibs drawn in backcross families. Consider two backcross individuals 1' (with two alleles A: and A- ) and j (with two alleles A - and A - ). Define 6' - as the coef- 2m 2 f Jm J 13 f ficient of coancestry between individuals 2' and j. By definition, 0U can be calculated as) 1 923' = Z{Pr(Aim = Ajm) + PrIA’im = Ajf) + PI'(Az'f = Ajm) + PI‘(Aif = Ajf)} I : ZIgime + Bimjf + gifjm + gifjf) where 621]; can be interpreted as the allelic kinship coefficient, i.e., the probability that a randomly chosen allele from individual i is IBD to a randomly chosen allele are not distinguishable. from individual j. Note that the two terms 0,; and 6.;- mj f fjm However, their sum is unique and therefore the two terms can be combined as one single term, denoted as aim/jf(= gimj f + 92- f jm)‘ After the manipulation, the coefficient of coancestry for individuals 2' and j can be expressed as 6;- j = $0 + 7:mjm Him / j f + 6.,- f j f) whlch IS composed of three components. Following Xie et al. (1998), the alleles shared IBD between individuals i and j can be expressed as, 22 ....Hi 21,. I] u 2 mm + Qim/jf + aifjf) Wimjm + Wim/jf + Trifjf (1.2.7) where 7r- are the alleles shared IBD derived . _ 1 . . . zme _ 2‘92me and W2 . =19. . m Qifif from the mother and father, respectively; 7r = égim/jf is the alleles shared im/J'f IBD due to alleles cross sharing, a special case for inbreeding sibs. Without inbreeding, 71'z-m / jf takes value of zero. For completely inbreeding population, the inbreeding coefficient F,- is 1 if alleles inherited from both parents are the same since these alleles can be traced back to the same grandparent. For example, for an individual with genotype QmQ f’ Pr(Qm = Q f) = 1 since both alleles Qm and Q f are inherited from the same grandparent. Therefore, for individuals with itself, 7r“: = 1 + F,- would be the same as 7r2-j(i # j) when i and j carry the same genotypes. The expected proportion of alleles shared IBD cpl-j can also be calculated. Thus, the proportion of alleles shared IBD can be partitioned as three components for inbreeding sibs, rather than two components considering parent-of-origin effects proposed by Hanson et al. (2001). To further illustrate the idea, we use one backcross family to demonstrate the derivation. A full list of possible IBD sharing values for the two reciprocal backcrosses are given in Table 2.1. Considering a backcross family initiated with the Qq x QQ cross. Randomly selecting two individuals i and j with 23 genotype QmQ f and Qm Q f’ the Malécot’s coefficient of coancestry can be calculated as» 1 7Tij = 26ij 2 éiprIQim = Qjm) + PIIQim = ij) 'I' PIIQif : Qjm) +Prgag +103, the same as the variance components partition model considering parent-of-origin effects given in Hanson et al. (2001). 1.2.4 Likelihood function and parameter estimation Assuming multivariate normality, the density function of observing a particular vector of data y for family It is given by, I 1 _ f(yk;uk, 2k) = xp -§(yk - uk)TZk 1(yk - #1,) where yk = (y1 k, ...yn k k)T is a "k x 1 vector of phenotypes for the kth backcross family and nk is the kth backcross family size. The overall log likelihood function for K independent backcross families is give by, K 3 = E: 10glf(yk;rtk,2k)l (1-2-9) k=1 25 Note that the maternal effect “It is the same for families with the same maternal genotype. Thus, only three maternal effects need to be estimated. Two commonly used methods can be applied to estimate parameters in a mixed effects model, the ML method and the REML method. Both methods have been applied in genetic linkage analysis in a variance components model framework (Amos 1994; Almasy and Blangero 1998). In general, ML estimators tend to be downwardly biased given that it does not account for the loss in degrees of freedom resulted from estimation of the fixed effects (Corbeil and Searle 1976). The REML is based on a linear transformation of the data such that the fixed effects are eliminated from the model, hence it provides less biased estimators. Even though standard softwares such as SAS have standard procedures to estimate parameters for a mixed effects model, the estimation for the proposed model can not be directly fitted into a standard software. The estimation procedures for the two methods are detailed here. 1.2.4.1 The ML estimation The phenotype vector in the kth backcross family follows a multivariate normal dis- tribution, i.e., yk ~ M VN (X k5, 2k). Parameters that need to be estimated are 9 = (5.01271, 0%, 0,2,,f, 0%, 03) with B = (up/12443)- 2 2 2 am a o 2_ 2 2 2 2 2 2_ 2_;£2 _ mf Define a — om+of+0mf+og +06, hm — 02, hf — 02, hmf — 02 , fl 2__£ 2__2_2_2_22- . --. hg — 02, and he — 1 hm h f hm fig. 0 IS the total phenotyp1c variance and hence hgn and It? can be considered as the heritability of maternal and paternal 26 alleles, hgn + h;- + h?" f is the total genetic heritability due to the major QTL, ’13 is the polygene heritability and h2 = hgn + h? + h?” f + h; is the overall heritability. The phenotypic variance-covariance between any two individuals i and j in the kth backcross family can then be re-expressed as: yik Var = U2Hij|k yjk where 62- 6,-- _ .7 Hall: — with 62- : nimimhgn + Wififhf + ”im/ifhfnf + ¢iihg21 + hg; dj is defined similarly; and 5ij = nimjmhgn + Wifjfhf + Wim/jfhfnf + (bl-jhg If there are "k sibs in each backcross family, H k = {Hijlk}nkxnk is simply a "k x n k matrix. Instead of estimating Q = (13, 0,27,, 0%, a?” f’ 03, 03), we can estimate Q=(B, 02, hgn, h?“ hfn , hg) and solve above equations to get the original variance estimates. Now the log-likelihood can be expressed as, K 6(9) = Z loglf(ykl9)l k=l (1.2.10) K oc-Z{n—klo 02——1—lo|H |——1—( —X B)’H—1( ‘X (3)} k_1 2 8 2 g k 202yk k k We k Maximizing likelihood (3.2.3) is equivalent to maximize (1.2.10). Here, we take an iterated estimation procedure to estimate the parameters contained in Q. For given 27 values of hgn,h§¢,h3nf,hg, we can get the maximum likelihood estimates (MLE) of parameters (B, 02) by setting the partial derivative of the log-likelihood function (1.2.10) to zero, i.e., K [3 H Z(X{H;1Xk)—1(X{Hk‘1yk) k=1 1 K 5.2 — —— Z (yk -- XkBITHk—IIYk — Xkfi) _ K Zk=1 "k k=1 It can be seen that 6 and 62 are functions of 127271, 12%, hgn f and fig. Plug the updated parameter values for B and 02 into likelihood equation (1.2.10), the log- likelihood function can be simplified as, K K ”k 2 1 K as) = }: log[f(yk|Q)] oc — 2: 710g} — 5 Z longkl (1.2.11) The simplex algorithm can be applied to maximize the function (1.2.11) with respect to parameters ’17an ’12, hfn and ’13 subject to the the constraints that oghgn,h},hfnf,hgg1andogh2 g 1. To guarantee a positive definite covariance matrix when searching for these her- itability values over the constraint parameter space, a reparameterization technique is adopted (Xu and Atchley 1995). Taking 62-]- : 112(7r;m jm772n+7ri f j f712c+7rim / jf772n f 2 2 2 2 hm h f '2 hm f 2 [lg .. 2 2 __ __ _ . __‘_, 2_ 2 2 $237g) where 7m — [1217f - h2’7mf — —h2 ’79 — ’12,» and h — hn1+hf+ h?” f + hg. We now have four new unknowns with the constraints: 0 S h2 S 1, 28 + The new constraints can be easily satisfied by a reparameterization technique. Let . - . - 2 2 2 2 2 u, ’Um, 11f, ”m f and 129 be any real numbers. Estlmating h , 7m, 7 f’ 7m f and 79 can be done by maximizing the likelihood function ( 1.2.11) via searching through the real domain space with respect to u, Um, of, ’Um f and 129 with the reparameterization 2 en h = —, 1 + c“ 2 8m 7 = , m evm + evf + evmf + avg 2 evf 7 = , f e’Um + evf + evmf + e1’9 2 evmf 7 = , , mf evm + elf + evmf + evg and evg 2 _ ’79 — evm + evf + evmf + evg MLEs of [22, 772,1, 7%, 7371 f and 73 can be obtained through the estimated values for 11, um, of, ”m f and 119 according to the invariance property of MLEs. These estimated MLES are used to update h2, h?,,, 11;, h?” f and fig, and hence o2 and B. The iteration steps continue until convergence. 29 1.2.4.2 The REML Estimation The REML method was first proposed by Patterson and Thompson (1971). This method has been broadly applied to estimate variance components in a mixed-effect model framework. Taking f2: (6, 9) where 9: (0m,0 gm?” f ,0g,0e). The REML method starts with maximizing the following likelihood function, K K 1 _ = Z lOslfo'klell = “2' Z {loglzkl +108(IXIcEk1XkI) +yIcPkyk} k=1 k=1 (1.2.12) wherePk=E_1—2_1Xk(Xk2—k1Xk)_1Xk2k1.Wecancombineallfamilydata k k together as one N x 1 vector denoted as y where N = Zk=1 nk. All the X k and the variance-covariance matrix 2k corresponding to each family can be combined. The log-likelihood function for the combined data is expressed as, :1: _ _ 1 I —1 I 3 (GI—1020649)] — -5 {10s IEI +10s(|X 23 XI) +y Py} (1-2-13) where 2 is a block diagonal matrix with the kth diagonal block 2 k corresponding to the kth family and off-diagonal blocks being zeros; P is also a block diagonal matrix with block elements given by Pk. The dimension of E is N x N. With this combination, we develop the following REML estimation procedure. We apply the Fisher scoring algorithm to estimate the unknowns, which has the form, 03*(9) t We” 9(t+1) ___ 9n) ”(em—1 30 where I(e(t)) is the Fisher information matrix evaluated at Gm which can be ex— pressed as, tr(PIIfPIIm) tr(PHfPHf) tr(PHfPIIm/f), tr(PHm/fPHm) tr(PHm/fPHf) tr(PHm/fPHm/f), ( ”(1011mm tr(PHfP) tr(PHm/fP), tr(PHfPg) tr(PHfP) tr(PHm H1909) tr(PHm/fP) t7'(PgP) tr(PP) ) The first-derivative of the log-likelihood function 8* with respective to each vari- 31 ance components is given by, (95* 1 T 06* 1 T 05* 1 T 502 = ‘§(“‘(P”m/f) ‘ Y PHm/fPY)’ m, (95* 1 T 509 2 ii- —l(t7‘(PI )— TPP ) 00g — 2 N y y The REML estimator of B is the generalized least squares estimator, i.e., B=(XTS’1X)‘1XT2‘:‘1Y 1.2.5 QTL IBD sharing and genomewide linkage scan The above IBD computation procedure assumes that a putative QTL is located right on a marker. When a QTL is located within an interval, a more efficient approach would be to do an interval scan and to test the imprinting property of QTLs at posi- tions across the entire linkage group. Under the proposed framework, essentially we need to estimate the proportion of putative QTL alleles shared IBD at every genome position. Here we propose a method to calculate QTL alleles shared IBD inside an 32 interval conditional on the flanking markers. The so called expected conditional IBD values can be derived at each test position as a function of recombination fraction between the two flanking markers, and the one between one flanking marker and the QTL. We use one backcross initiated with the cross QQ x Qq as an example to il- lustrate the idea. For a putative QTL with two alleles Q and q, four QTL genotype pairs QQ — QQ, QQ — Qq, Qq - QQ and Qq — Qq can be formed. If the QTL genotype is observed, the corresponding QTL alleles shared IBD can be calculated (see Table 2.1). In general, the QTL genotype is unobservable, but its conditional distribution can be calculated from the two flanking markers. For individuals 2' and j with flanking marker genotypes g,- and gj, let WUIGiGj be the IBD values calcu- lated at the QTL position between individual i carrying QTL genotype G,- (=1 or 2 corresponding to QQ or Qq, respectively) and individual j carrying genotype G j (similarly 1 or 2), where v = im jm,i f j f or im / j f' For example, WimjmlGiGj is the proportion of IBD sharing between individual i carrying QTL genotype G;- and individual j carrying genotype G j for alleles derived from the mother. Let ch,l , and 90G I , be the conditional distribution of QTL genotype Gi and i 92 j 93 G j for individuals 2' and 3' given on the flanking markers 92' and gj, respectively. This conditional probabilities can be easily calculated and can be found at standard QTL mapping literature (see Wu et al. 2007). The probability to observe no] G- G , is just 2 J 1,90, I ,goG I ,. Thus, the expected IBD values between individual i and j at the 2 92 J 93 tested QTL position conditioning on the flanking markers 9,- and gj can be calcu- . _ _ 2 2 . . . lated as, 7rv—E(7rv|GiGj)—ZGZ_:1203:1nleingocilgiipajIgj. For the above 33 example, the IBD values derived from the maternal and paternal parents can be calcu- lated as fiimjm=E(7rimijGz'Gj)zo’Swllgiipllgj+ 0.5;01lgicp2lgj + 0.5992lgicpllgj + 0.51p2lgi902'gj and irifjszQrz-fjfIGiGj)=O.5<4 design gives the worst QTL position estimation with the largest RMSES for both estimation methods. Therefore, a balance of family and offspring size is needed. A moderate family size with moderate offspring size would be necessary in order to achieve reasonable parameter estimation for both QTL effects and position. 44 Table 2.2 also lists the results of power analysis under different scenarios with two different estimation methods. Power1 denotes the empirical power calculated from the simulated null distribution corresponding to hypothesis test (2.2.6). We simulate the null distribution by simulating data assuming no QTL effect (i.e., 072’1=0%:07271f:0)' The LR test statistics is calculated for each simulation run and the 95% cutoff is 2 reported as the threshold value. Power refers to the theoretical power which is cal- culated assuming the mixture chi-square distribution. Results show that the threshold calculated from the theoretical distribution is smaller than the one calculated from the simulation. Thus the testing power based on the theoretical cutoff is greater than the empirical power. The testing powers under different sampling designs are very comparable except for the 100X4 design in which the power is dramatically reduced compared to other designs. No remarkable difference in power for both estimation methods is observed. Fig. 1.2 shows the log—likelihood ratio test statistic calculated under the four sampling designs across the simulated linkage group by using both ML and REML estimation methods. The plotted LR curve is from averaged LR values out of 100 replications. It is clear that large offspring size always gives large test statistics. As the family size increases from 4 to 100 and so decreased offspring size, we observe a huge LR value decrease. Clearly, the 100x4 design is less powerful than the others. The last column listed in Table 2.2 shows the type I error for testing genomic imprinting, i.e., H0: 0,271 = 0.21:. The simulated data assume no imprinting (0,2,, = 0%=1.5). The imprinting test is only conducted at the position where the overall QTL test 45 o 20 410 4 6'0 80 100 Test position (cM) Figure 1.2: The LR profile plot. The left and right figures correspond to the LR profiles generated using the ML and REML method, respectively. The arrow indicates the true QTL position. shows significance. The imprinting test statistic LR,-mp is compared with a chi-square distribution with 1 df. Overall, the REML estimation method results in smaller type I error rate than the ML method does. As the number of families increase, the type I error decreases. The 4x100 design yields the largest type I error. In comparison of the ML and REML methods, the REML method gives smaller estimation biases but larger RMSEs than the ML method does. This reflects the large variability of the REML estimation. In terms of computation speed, the ML method is faster than the REML method. Even though the QTL position estimation 46 is better estimated by using the REML method when family size is small, as family size increases, the REML method performs worse than the ML method (Table 2.2). In checking the LR profile plot in Fig. 1.2 and the power analysis in Table 2.2, we do not observe significant gain in power by using the REML method. The two methods do no dominate each other and are very comparable in power analysis. With large sample size and limited computing resources, one might want to try the ML method first. However, the REML method is suggested when testing imprinting since it has small type I error. In a short summary of the results listed in Table 2.2, the 8x50 and 20x20 designs give better QTL position estimation and testing power. In terms of the type I error for imprinting test, the 20x20 and 100x4 designs provide reasonable type I error. Thus, a practical guidance is to choose the 20x20 design, and one should always avoid designs with extremely large or extremely small family size. 47 3308358 550 Ho meoSmcEono 8H ”a meH. mum .Aw.m.mv $3 3 wamvcoamvtoo $308 awe... wcSEEEH 05 0... 332 5.509 Ems/Sommmmu Eon—=0 30328:“ was 305860 23 war”: $33528 8.58 umop “coho HBO 2.296 on: 8. «2888.28 NSBOQ Home H8308 63.8 65.8 :28 33.8 338 $88 $38 62.8 63: 23 as 3s 2d as was 22 2: 2.... 2: 2.2 8.8. 24mm $2.8 33.8 :58 $2.8 35.8 62.8 $8.8 82.8 22.8 as 2.2 as SN 25 2.0 as :3 2s 2w 2.2 at. .22 m 2 3.2.8 $3.8 38.8 838 32.8 $8.8 68.8 8.88 A28 2s 23 .2; was as as was 23 as was 3.2 52. 22mm 328 32.8 32.8 33.8 22.8 82.8 83.8 82.8 3.8 2.0 2: as 8a 2.0 ass 2a 2a was new 2.2 $2 .22 .0. o 82.8 82.8 32.8 82.8 22.: $3.8 $2.8 32.8 A28 83 23 as as as Ed 23 m3 2.... was was $2. 222 92.8 82.8 9.28 $2.8 $2.8 $2.8 $2.8 $2.8 8&8 5s 2a was as was was 2.o «as 3m 2s 22 2.2. .22 o m 8309 mHoBonH Haoaom m md md o m 0H Haowv H552: Mb Mb \zmb Mb :mb mi N: H: qoSHmOnH 283225qu .8855an E 53m mam mamumfieaa 93 Ho macho Hogs—pg :88 23 Ho $08 Bang 23. .awmmov wEEEem omxom 23 89.5 muommo $55295 pcohmmzo wEBQHm HBO a 8... 83233 832555 2: :0 comma 883838 230:8an 39% Home noEmoa .HEG 23 Ho 32mm H23 $32 .8308 27H. ”m; 28¢ 48 To evaluate the proposed model under different imprinting mechanisms, we sim- ulated data assuming different degree of imprinting. Since the results in Table 2.2 indicate that a 2OX20 design provides relatively reasonable parameter estimation, good power and small type I error rate for imprinting test, the evaluation of imprint- ing analysis is thus focused on this design. The results for 100 simulation replication are summarized in Table 2.3. Three imprinting models are assumed: complete ma- ternal imprinting (07271 = 0 and 0%=3), complete paternal imprinting (0,271 = 3 and 0%=0), and partial maternal imprinting (0,2,, = 1 and 0.3:”. Both ML and REML estimators are reported. Overall, the two estimation methods produce very compa- rable results with less biased estimations by the REML method as we expected. All the parameters can be properly estimated with reasonable precision. Large imprinting power is observed when the variance difference between the two parent-specific variance components is large. When the difference between the two parent-specific variance components is reduced, the power to detect imprinting is largely reduced. For example, when data are simulated assuming complete paternal imprinting, the power is O.91(0.86) by using the ML(REML) estimation method. With partially imprinted data, the imprinting power reduces to 0.24(0.09) by using the ML(REML) method, even though it can be increased by increasing the offspring sample size (data not shown). 49 8805580320 20500 .58 m.m 03.08 00m 503000305 £80838 2308800 05 mo 805.8 08888 050. wmmEm 02$ 00 050028.500 3:8 0:0,: 080 _0820: :2? 28:80.85 0.: 8 53m 80385: 09H 303000me 63:38: 30302005 0:» 5:0 @350 302880 on» 9865 30350—00 Mesa 2: 8 238:8 N 538 as 1280 .3de .230 583i 2% .ms $3 ”9 25 Names .3 .ovuisms . as ems H: ”Amwmd H m5 ”2 _0008 .83 83080.85 vvuw_58_m 503000582 .6008 massing: 050 80:20:02 00 832 .H 08.0 2 $23 883 :33 $2.3 38.3 503 $33 :83 $23 was as 23 23.0 $8. 82 Rod woos was $92 22.3 5 323823 2&3 823533853 823 was as 29m 83 and was :3 292 89% 2 m2 823 $83 $83 A 53 $8.3 383 SS3 833 $23 was 3.0 2o.“ 33. $30 22 wood 23 83. 5.2 3.2 2 $83 $23 :83 3:3 :53 $33 88.3 8.0 23 23 :3 was ”so 23 232 2a.? 2 : SE3 883 $2 .3 8:3 A833 SE3 823 SS3 383 2: 8.2 $3 3.80 22 Ed 820 £3 $3 «as 80% H 823 C23 :23 $3.: 823 0,23 :23 8.2 8.2 23 80.0 32 23 83 M22 3% 2 2 . \E x. E 0 0 $2535.58 m 23 me we as o m 2 2 we 30 s 358 m0 m: MQ m1 m1 H1 5058& 292.843 5033528m .3600 92:08am omxom 05 80 0003 £0008 $58.58 080 50:00:02 4:3 0023586 330 no“ 85.002005 5033588 o3 50 0023 03.0858 83088.3 000mm 080 503605 490 9: m0 32mm «:8 mafia £0303 05H. 5; 0308 50 In reality, whether a QTL is imprinted or not is an unknown prior. When a QTL has Mendelian effect and is not imprinted, is there any power loss by analyzing with the proposed imprinting model? Or when a QT L is actually imprinted, is there any power loss by analyzing with regular variance components approach? To answer these two questions, we simulated data under different scenarios and analyzed with both Mendelian and imprinting models. The first and second column in Table 1.4 refer to the simulation and analysis models, respectively. M refers to the Mendelian model without variance components partition and I refers to the imprinting model with allelic-specific partition of the variance components. For comparison purpose, heritabilities are recorded instead of original variance components estimates. The polygene and residual variances are fixed as 0.5 (h2 = 0.083) and 2, respectively for all the simulation scenarios. We first simulated data with one additive genetic effect without partitioning variance into allelic specific components. This is equivalent to simulate data assuming the Mendelian model. A single additive variance component of 3.5 is assumed which corresponds to a heritability of hg = 0.583. The second scenario is to simulate data with three allelic-specific variance components. Simu- lation models 11 and 12 correspond to a complete maternal imprinting model (i.e., h2 = 0 and h; = 0.5) and a partial maternal imprinting model (i.e., hgn = 0.083 and h2 = 0.417), respectively. The variance component 072” f is assumed to be 0.5 (hgn f = 0.083) for 11 and IQ. In all the simulations, we use the 20x20 design to make the comparison. Similar results are expected under the other sampling designs. Since the true variance components values for the imprinting model is unknown when data 51 are simulated assuming Mendelian effect and vice versa, only standard deviations for these parameter estimates are recorded (listed as italic font in the parentheses). The simulation results are summarized in Table 1.4. When the simulated model is Mendelian, QTL position is better estimated with the Mendelian model than with the imprinting model. No remarkable difference in power is observed for both models. The estimated parent-specific variances due to maternal and paternal alleles are almost identical and no imprinting is detected. When data are simulated assuming imprinting (model 11 and 12), large power is observed when analyzed with the imprinting model. For example, the power is 86% when analyze the 11 imprinting data by the Mendelian model. The power is increased to 95% when data are analyzed by the imprinting model. When imprinting data are analyzed with the Mendelian model, the major QTL variance is under-estimated and the polygene variance is slightly over-estimated. No remarkable differences are observed for the estimation of the three fixed mean effects and the residual variance under all simulation cases. In any case, the imprinting model performs better or no worse than the Mendelian model. Thus, it is generally safe to apply the imprinting analysis for data shown any inheritance pattern. 1.3.2.2 Multiple QTL analysis 52 .Eoww fifid 20mm ad fivu‘mon: m‘Hn—LG 025 Cu .flmeu N9 “:50. HQ 633 $83 32.: 88.3 $6.3 32.3 303 :33 and 80.0 $2 3.3 ”was 62 $3 a: .0200 £63 333 $83 38.3 663 @603 $63 92.3 23 S3 22 68 Sad 22. $2 3% a: a; who 0 2 :83 2:3 3:3 32a: $53 633 333 :33 03; 83 was 8.3 3.30 was 3.3 S6 .0200 $0.03 363 333 82.3 333 $33 833 $6.23 32 50¢ 80¢ 80% 62 20c 83 8.6 02 2.0 m2 ...; Ed was 20% was 20% Bass xzmhc M0 :m0 :0660& “Saab M0 :m0 a0660m cosmasmm «0 H0 0060500509 E :0Zw 0.5 000:0 0060006 0005 05 30 300“ 000200 0:8 .0w600 omxom 05 0000: 0450 03... fits 0000386 $00 08 6000230“ 0050386 2: :0 00002 00008600 8000:800Q “8&0 000 005603 A90 0% m0 32mm 0:0 «.qu 035 ”mg 0309 53 To see the relative merit of multiple QTL analysis against single QTL analysis when multiple QTLs are located on the same linkage group, two QTLS are simulated with QTL 1 (denoted as Q1) located at the second interval, 280M away from the first marker (M1) and QTL 2 (denoted as Q2) located at the fourth interval, 68cM away from the first marker. Two simulation scenarios are considered. The first scenario considers two non-imprinted QTLs with equal genetic effects. The second scenario assume Q1 is imprinted and Q2 is not imprinted. Simulated parameters for the two QTLs are listed in Table 1.5. Data are simulated assuming the 20x20 design. Parameters are estimated by the ML and REML approaches with 100 replicates. Fig. 1.3 shows the LR profile plots for the single and multiple QTL analysis. The single QTL model indicates three major peaks. The highest peak for the single QTL analysis is located at the wrong QTL interval where no QTL is assumed. The so called “ghost image” of QTL can be removed and the positions of the two QTLs can be precisely mapped on the chromosome by the multiple QTL model. Two clear peaks indicating the correct QTL positions (arrow signs) are observed by the multiple QTL analysis. However, we observe a remarkable reduction in LR values by multiple QTL analysis compared to those by the single QTL analysis. Since the threshold for multiple QTL analysis is unknown, we can not make the conclusion that multiple QTL analysis is less powerful than the single QTL analysis. It is possible that we may gain accuracy in QTL position estimation at the cost of power loss. Similar phenomenon and issues were also observed and discussed in the literatures (Zeng 1994; Xu and Alchley 1995). 54 20 . . . . ML ------- Single QTL model multiple QTL model I d 18 16 i l 12 I l I l 10 LR omnmm l 1 o 20 1 4o 60 t 80 100 Test position (cM) Figure 1.3: The LR profile plot for singe QTL and multiple QTL analysis. The true QTL positions are simulated at 280M and 68cM (see the arrow sign). The dotted curve and the solid curve represent the LR profiles by single QTL and multiple QTL analysis, respectively. The left and right figures correspond to the LR profilefi generated using the ML and REML method, respectively. The results of the multiple QTL analysis are summarized in Table 1.5. The fixed mean effects, the polygene and residual variance components can be reasonably estimated with small RMSES, similar results shown in Table 2.2 for the 20x20 design and hence are not reported here. Only the genetic factors for the two simulated QT Ls are reported. It can be seen that both ML and REML methods provide reasonable parameter estimates and are very comparable. Under the first simulation scenario in which both QTLS are not imprinted, the genetic effects are all slightly over-estimated 55 by both methods. This might be due to the interference of the two QTLS in the same linkage group. The multiple QTL model may not completely block the effects of QTLs outside of the tested interval. For the second simulation scenario, an interesting pattern is observed. When one QTL is imprinted (Q1), the maternal and paternal variance components for the second one (Q2) tend to be estimated with bias in the direction as the first imprinted QTL, i.e., 0%, tends to be over-estimated and 0% tends to be under—estimated. As we gain accuracy in QTL position estimation, we lose precision for the parameter estimation. These effects are expected as described in Zeng (1994) and Xu and Atchley (1995). More investigations are needed in multiple QTL analysis in order to maintain a good balance of QTL position and parameter inference. 1.4 Discussion Statistical methods assuming fixed effect models for iQTL mapping in controlled outbred and inbred lines have been proposed (e.g., Koning et al. 2000; Cui 2007; Cui et al. 2006 2007). Considering the limitation of fixed-effect models, a random model that estimates the QTL variance by extending single line cross to multiple line crosses should be more powerful in QTL variance inference (Xie et al. 1998). The IBD-based variance components method assuming random genetic effect for iQTL mapping has been developed in human linkage analysis (Hanson et al. 2001). However, no study has been proposed to map iQTL using variance components method with inbred or partially inbred line cross. In this article, we have first time presented an IBD-based 56 variance components framework to search for the existence and distribution of iQTL throughout the entire genome in multiple experimental line crosses. The idea of the method is demonstrated through a backcross design. It can also be extended to multiple F2 line crosses using the sex-specific recombination information as proposed by Cui et al. (2006). The key point of the proposed iQTL variance components analysis is to parti- tion the additive genetic variance into parent-specific components. We have prOposed a new parent-specific allelic sharing method which characterizes the relatedness of parent-specific alleles between pairs of individuals in a backcross pedigree. The calcu- lation of parent-specific allelic sharing is based on the information of the coefficient of coancestry. More complicated calculation of the coefficient of coancestry can be found at Harris (1964). The quantification of the coefficient of the coancestry proposed by Harris (1964) can also be utilized to calculate the parent—specific IBD sharing in an inbred human population, and thus for iQTL mapping in inbred human populations. There have been extensive studies in literature about various methods in the estimation of variance components in a mixed-effect model framework. The ML and REML are two commonly applied methods in variance components estimation with less biased estimation by the REML method. Simulations show that the ML method yields high precision in parameter estimation but with relatively large bias than the REML method. Power analysis indicates that the ML method is a little more powerful than the REML method but with large type I error when testing imprinting. In terms of computing speed, the ML method is faster than the REML method. Thus, no 57 single method dominates the other. In terms of overall QTL test, we suggest to use the ML method for the genomewide linkage scan and use the REML method for the imprinting test. The effect of sampling design is investigated by extensive simulations. Results indicate that one can always achieve large power with large offspring size when the total sample size is fixed. The LR value differences under different sampling designs are shown in Fig. 1.2. However, the combination of small families each with large offsprings gives poor parameter estimation and large type I error for imprinting test (Table 2.2). As the number of families increase, we observe less biased parameter estimates for both fixed and random effects, but with poor QTL position estimation and small power. This information implies that it is necessary to enlarge the number of families to improve precision of parameter estimation. Meanwhile, a balance of family and offspring size is needed to maintain good QTL detection power and position estimation. Our simulations indicate that for a fixed total sample size (12:400), both 8x50 and 20x20 designs yield comparable results and both designs outperform the other two designs (Table 2.2). Moreover, the 20x20 design produces relatively small type I error in imprinting test. With the 20x20 design, results in Table 1.4 indicate that the imprinting model is better or as good as the regular Mendelian analysis without considering imprinting. In real data analysis, it should be safe to apply the proposed imprinting model for data with any imprinting pattern. In this study, we have extended the single marker-based analysis to an interval- based mapping for genomewide scan and testing of iQTL effects. Considering the 58 interference of QTLS located on the same linkage group, we have extended the single QTL model to multiple QTL analysis following the derivation of Xu and Atchley (1995). Simulation results indicate the relative merit of the multiple QTL analysis with improved QTL position inference, but with possible power loss (Fig. 1.3). This, however, has been a common issue in multiple QTL modelling (Zeng 1994; Xu and Atchley 1995). More investigations are needed in deriving efficient and robust multiple QTL mapping models to improve precision without suffering too much from power loss. The theoretical distribution for the likelihood ratio test has been a challenging problem in QTL mapping. Dupuis and Siegmund (1999) first proposed theoretical properties for LR test statistics in a genomewide linkage scan for QTLs in an inter- val mapping frameworh with a fixed—effect model. Currently, most linkage analysis using the variance components method assume that the LR test statistic follows a mixture of chi-square distribution (Allison et al. 1999). The mixture distribution is derived following Self and Liang (1987). With multiple testings and multiple nuisance parameters in a genomewide scan, the assumptions to get the mixture chi-square dis- tribution may not be satisfied. Moreover, the multivariate normal assumption for the phenotypic data required to get the mixture distribution may not even valid. No the- oretical work has been done to investigate this in a IBD-based variance components linkage mapping. Our simulations indicate that the theoretical threshold calculated from the mixture chi-square distribution is smaller than the simulated cutoff. Thus, the power calculated with the theoretical threshold is slightly inflated. A modified 59 mixture chi-quare distribution may be more appropriate. More theoretical investiga- tions are needed in this regard. 60 Chapter 2 A general statistical framework for dissecting parent-of-origin effects underlying endosperm traits in flowering plants 2.1 INTRODUCTION The life cycle of an angiosperm starts with the process of double fertilization, where the fertilization of the haploid egg with one sperm cell forms the embryo, and the fu- sion of the two polar nuclei with another sperm cell develops into endosperm (Chaud- hury et al. 2001). Thus, endosperm is a tissue unique to angiosperm. The embryo and endosperm are genetically identical, except that the endosperm is triploid com- 61 posed of one set of paternal and two identical sets of maternal chromosomes. In cereals, the endosperm of a grain is the major storage organ providing nutrition for early-stage seed development, and more than that, serves as the major source of food for human beings. The identification of important genes that underlie the variation of quantitative traits of various interests in endosperm, is thus paramountly important. Genomic imprinting refers to the situation where the expression of the same genes is different depending on their parental origin (Pfifer 2000). It has been increasingly recognized that many endosperm traits are controlled by genomic imprinting. For example, endoreduplication is a commonly observed phenomenon which shows a ma- ternally controlled parent-of-origin effect in maize endosperm (Dilkes et al. 2002). Cells undergo endoreduplication are typically larger than other cells, which conse— quently results in larger fruits or seeds beneficial to human beings (Crime and Mow- forth 1982). Other reports of genomic imprinting with paternal imprinting in maize endosperm include, for instance, the 7' gene in the regulation of anthocyanin (Ker- micle 1970), the seed storage protein regulatory gene dsrl (Chaudhuri and Messing 1994), the MEA gene affecting seed development (Kinoshita et al. 1999) and some a-tubulz'n genes (Lund et a1 1995). These studies underscore the value of developing statistical methods that empower geneticists to identify the distribution and effects of imprinted genes controlling endosperm traits. Statistical methods for mapping imprinted genes or imprinted quantitative trait loci (iQTL) have been extensively studied. Focusing on different genetic designs and different segregation populations, methods were developed in mapping iQTL 62 underlying quantitative traits in controlled experimental crosses (e. g. Cui et al. 2006, 2007; Wolf et al. 2008), in outbred population (e.g., de Koning et al. 2002) and in human population (e.g., Hanson et al. 2001; Shete et al. 2003). Broadly speaking, these methods can be categorized into two frameworks: one based on the fixed effect model where the iQTL effect is considered as fixed (e.g., Cui et al. 2006, 2007; de Koning et al. 2002), and the other considering iQTL effect as random and estimating the genetic variances contributed by an iQTL (e.g. Hanson et a1. 2001; Shete et al. 2003; Li and Cui 2009a). The method proposed by Li and Cui (2009a) extended the variance components model to experimental crosses and showed relative merits in mapping iQTLs with inbred lines. However, all these approaches for iQTL mapping were developed based on diploid populations, whereby chromosomes are paired. Their applications are immediately limited when the ploidy level of the study population is more than two for instance, the triploid endosperm. In this study, we propose to extend our previous work in iQTL mapping with vari- ance components approach in experimental crosses (Li and Cui 2009a), and consider the unique genetic make-up of the triploid endosperm genome to map iQTLs un- derlying triploid endosperm traits. Cytoplasmic maternal effects are also considered and adjusted when testing for genomic imprinting. Motivated by a real experiment, we propose a reciprocal backcross design initiated with two inbred lines. Likelihood ratio test (LRT) is applied to test the significance of the variance components and its asymptotic distribution is evaluated under irregular conditions. The article is organized as follows. Section 2 will illustrate the basic genetic design 63 and the statistical mapping framework. We propose a new approach for calculating the parental specific allelic sharing among inbreeding triploid sibs. Statistical hy- pothesis testings are proposed to assess iQTL effects. The limiting distribution of the LRT under the proposed mapping framework is studied. Multiple QTL model is also proposed to separate closely linked QTLs. Section 3 and 4 will be devoted to simulations and real application followed by a general discussion in section 5. 2.2 STATISTICAL METHOD 2.2.1 The genetic design Using experimental crosses for QTL mapping has been the traditional means in tar- geting genetic regions harboring potential genes responsible for quantitative trait variations. Toward the goal of mapping iQTL underlying endosperm traits in line crosses, we propose a reciprocal backcross design. A similar design was proposed by Li and Cui (2009a) for diploid mapping populations. In brief, two inbred parents with genotypes AA and aa are crossed to produce an F1 pepulation (Aa). F1 individuals are then backcrossed with one of the parents to generate backcross populations. We can use both parents as the maternal strain to cross with an F1 individual to generate two backcross segregation populations. Or we can use F1 individuals as the maternal strains to cross with both parents to produce another two sets of segregation popu- lations. The so called reciprocal backcross design generates four different segregation populations with each one being considered as one family. Large number of backcross 64 families can be obtained by simply replicating each one of the above crosses. To distinguish the allelic parental origin, we use subscript letter f and m to denote an allele inherited from the father and mother, respectively. A list of possible offspring genotypes considering the unique genetic make-ups in the triploid endosperm genome is detailed in the second column in Table 2.1. Clearly, the endosperm genome carries one extra maternal copy due to the unique double fertilization step in flowering plants. When a dosage effect is considered, we do expect different expression values triggered by endosperm and embryo genes. 65 0 0 00.. Q0 0) 0) 00. 0 050:0 0 Q0 Q0 0 0} 0h 0 0:. $5080 0:00 \05050 $5050 \05050 0.05959 \05050 \05950 $05050 \05950 0 m 00. Q0 0} o 00. 00. 00.5050 0 20 Q0 0 0 Q0 00. 00. «05.5. 00 x 00 \05050 305050 \05050 \95050 \05050 305050 005050 \95050 Q0 0 0 Q0 0} Q0 00. 0 0020.50 0 m 5 m0. 5 2 o 5 00.50.50 00 x 00 \95050 #05050 \05050 0.05950 09.05050 $05059 $05050 \05950 in m 0 Q0 Q0 0 00. 0:. 0.8080 n m 5 00. o 5 m0. 3 005050 00 x 00 \05059 «@5059 305950 x®5®5® \05959 \05959 005950 $05950 k .QEk xx: 55: 099000w 00000x00m Cm: #0008 w0t000 Om: 0£00Q0iu0000m $05000 .0900 000080000 000000000 0 00 0000 €0.23 08 0000600000 w0t000 Gm: 05000000020 00h. ”0m 0309 66 2.2.2 The model In QTL mapping, different line crosses can be combined together to increase the pa- rameter inference space via a variance components method (Xie et al. 1998). VC method has been shown to be powerful in assessing genomic imprinting in human linkage analysis (Hanson et al. 2001). Recently, Li and Cui (2009a) extended the VC model to experimental crosses and proposed an iQTL mapping framework via combining different line crosses for iQTL detection. We extend our previous work to triploid endosperm tissue considering the unique genetic components in the en- dosperm genome. Suppose total K families are collected which are composed of the four distinct backcross families. Assume "k individuals are sampled in the kth family. The phe- notypic variation of a quantitative trait in family k (denoted as yk) can be explained by the genotype-specific cytoplasmic maternal effect (denoted as pk), additive QTL effect (denoted as ak), polygene effect (denoted as gk), and random residual effect (denoted as ck). To incorporate the parent-of—origin effect, the additive QTL effect (ak) can be further partitioned into two separate effects, an effect due to the expres— sion of the maternal allele (denoted as akm) and an effect due to the expression of the paternal allele (denoted as akf)‘ The model can thus be expressed as yki =Hk+2akmi+akfi+9ki+ekiv k=1,-°- ,K; i: l,-~ ,nk (2.2.1) where akmz” akfz" gkz- and ekz’ are random effects with normal distribution, i.e., 67 akmz’ ~ N(0,072n), akfi ~ N(0,a)2€), gkz' ~ N(0,03), eki ~ N(0,o§); gki and ck,- are uncorrelated to akmz’ and a 1: fi? the coefficient 2 for a kmz’ adjusts for the effects of two identical maternal copies; ”k models the maternal genotype-specific effect. With four distinct segregation populations, we have only three distinct maternal genotypes, AA, Aa and aa. Thus the parameter ”k can be collapsed into three distinct values denoted as ,ul, #2 and #3 corresponding to maternal genotypes AA, Aa and aa, respectively. Let B = (#1412413), then model (3.2.1) can be rewritten in a vector form as ykszB+20km+akf+gk+ek, k=1,...,K (22.2) where X k is an ”k X 3 matrix with one column of ones and two columns of zeros. 2.2.3 Parent-specific allele sharing and the covariance be- tween two inbreeding sibs One of the major tasks in IBD—based iQTL mapping with variance components model is to calculating the IBD sharing probabilities and the phenotypic covariances between sibs. Such a method has been developed in human population (Hanson et al. 2001), which however, can not be applied to a complete inbreeding population in experi- mental crosses, because the allelic sharing relationship among sibpairs does not follow the pattern as the one derived from a natural non-inbreeding population. Instead, the IBD sharing probability can be calculated based on the Malécot’s coefficient of coancestry (1948) for an inbreeding population. Li and Cui (2009a) recently explored different allelic sharing patterns among sibpairs in a reciprocal backcross design with 68 a diploid tissue. We extend the method to the triploid endosperm genome and derive the covariances among sibpairs in a triploid tissue. 69 00.00000 0.0000000 500 00000000 020:0 000 w0€00000000 Om: 00000000 000: 000.000 00% 0.00000 0500 000 50.0 00000000 020:0 000 $00000 QmH 00000000 000: 0:00 00H. 000005 00008602 90000.55 00 .0 000 .0 0000002000 00% Om: 000000 020:0 0100000” ”mm 0.3me 00 s0 50 \0 s0 20 \0 s0 :0 00 50 :0 x 4 I 4 4 . ......4 .................H.< . \0 ..0 ...0 0 ...0 ..0 00 ...0 ...0 00 ...0 ...0 70 Consider two individuals 2' and j randomly selected from one backcross family with phenotype y,- and yj. Figure 2.1 shows all possible allelic sharing patterns between individuals 2' and j. The solid line indicates IBD sharing for alleles derived from the same parent and the dotted line indicates IBD cross-sharing for alleles derived from different parents. The allelic cross-sharing is unique to inbreeding populations, whereby this cross-sharing probability reduces to zero for non-inbreeding populations. Here we propose to calculate the IBD sharing between individuals 2’ and j (denoted as 7r”) for a triploid genome as 3627 if z" aé j 7r,- j = (2.2.3) §(5+3F,-) if 2' =j where 6,- j is the Malécot’s coefficient of coancestry and 172- is the inbreeding coeffi- cient (Harris 1964; Cockerham 1983; Lynch and Walsh 1998). By definition, 6U is calculated as the probability of two randomly selected alleles from individuals 2' and 3' being identical by descent. The calculation of 7r2-j is different from the usual IBD sharing calculation in non-inbreeding populations. It is rather interpreted as triple the Malécot’s coefficient of coancestry (Xie et al. 1998). For easy notation, we still adopt the term “IBD sharing probability” for Ni]- in the rest of the presentation. The calculation of the inbreeding coefficient follows the procedure given in Lynch and Walsh (1998). To illustrate the idea, consider two backcross individuals 23 (with genotype AmAmAf) and j (with genotype BmBme). The coefficient of coancestry 61-]- between these 71 two individuals can be expressed as, I 927‘ = §{Pr(Am1 E Bm1) + Pr(Am1 E Bm2) + PrfAmz E Bmi) +PI‘(Am2 E 3mg) + PI‘(Am1 E Bf) + Pr(Am2 E Bf) + PI'(Af E Bml) +Pr(Af E 87712) + PI'(Af E Bf)} = 3(49 9 + 29,- mm m f + 20pm + Oifjf) where the notation E refers to identical by decent; the subscript numbers 1 and 2 indicate two maternally inherited alleles; 021 j. is defined as the allelic kinship coef- ficient (Lynch and Walsh 1998). Noted that the two terms 6,- and 6,- mi f fjm are indistinguishable, but their sum denoted as 0- ~ = 6- - + 9- - is unique. zm/Jf 2m] f 1’me Thus, we have 0U = $135162.me + zgim/jf + Qifjf). Following equation (2.2.3), we have 4 1 . . 7r--=3l9 im/jf-Iggifjf=7rimjm+7rim/jf+7rifjf forHéJ 2] ij = 39 26 mm + g It can be seen that the IBD sharing between any two individuals can be decomposed as three separate components, one due to the IBD sharing for alleles derived from the maternal parent (Wimjm = g0,- ), one due to the cross-sharing for alleles mjm derived from different parents (7r , and one due to the IBD sharing _ 2 im/jf _ 30im/jf) for alleles derived from the paternal parent ( ). An exhaustive list of ”ifjf = $909} all possible IBD sharing probabilities for the four backcross families is given in Table 2.1. 72 Dropping the family index k, the covariance between any two individuals 2' and j can be expressed as, COVfI/iyyjlflimjmaVim/jfiflifjf) = C0V(2ami+afi+gi+ei’ +¢ijdg + Iijag where mejm = 211'(7rimjm) and ”gm/j}! = %(7rz-m/jf) are the IBD sharing and cross-sharing probabilities by considering one single maternal allele; 07271]. measures the variation of trait distribution due to alleles cross-sharing; 0U is the expected alleles shared IBD; Iij is an indicator variable taking value 1 if i = j and 0 if 2' 75 j. For a natural population without inbreeding, there is no allele cross-sharing for an individual with itself, hence 7’2’ = 0. For a diploid non-inbreeding population, m/ j f the trait covariance can be simplified as the one given in Shete et al. (2003). In matrix form, the phenotypic variance-covariance for individuals in the kth backcross family can then be expressed as __ 2 2 2 2 2 where the elements of II Hfl k and Hm/flk can be found in Table 2.1. m|k’ 73 2.2.4 QTL IBD sharing and genome-wide linkage scan The above described IBD sharing probability is calculated at a known marker position. Unless markers are dense enough, we have to search across the genome for potential (i)QTL positions and their effects. In general, the QTL position can be viewed as a fixed parameter by searching for a putative QTL at every 1 or 2 cM on a map interval bracketed by two markers throughout the entire linkage map. Thus, we need to estimate the QTL IBD sharing at every scan position. Since the conditional probability of an endosperm QTL given upon two flanking markers is the same as the one derived from a diploid genome (Cui and Wu 2005), the same procedure termed as the expected conditional IBD sharing described in Li and Cui (2009a) can be applied to calculate the QTL IBD sharing probability at every scan position. Assuming multivariate normality of the trait distribution for data in each family and assuming independence between families, the joint log-likelihood function when K backcross families are sampled can be formulated as K e = Z loam/00,201 (2.2.5) k=1 where f is the multivariate normal density. Parameters to be estimated include 5 = ( p u ) and Q = (CI2 02 02 02 02) Two commonly used methods in #1, 2) 3 "11 fi ”If, g: e . linkage analysis, the maximum likelihood (ML) method and the restricted maximum likelihood (REML) method, may be applied to estimate parameters. It is commonly recognized that the REML method gives less biased estimation compared to the ML 74 method (Corbeil and Searle 1976). Here we adopted the REML method with the Fisher scoring algorithm to obtain the REML estimates of the parameters (see Li and Cui 2009a for details of the algorithm). The conditional QTL IBD-sharing values vary at different testing positions. The amount of support for a QTL at a particular map position can be displayed graphically through the use of likelihood ratio profiles, which reflect the variation of the testing position of putative QTLs. The significant QTLs are detected by the peaks of the profile plot that pass certain significant threshold (see section 2.5 for more details). 2.2.5 Hypothesis testing In iQTL mapping, we are interested in testing whether there is any significant genetic effect at a test position and would like to further quantify the imprinting effect if any. The hypothesis for testing the existence of a QTL can be expressed as 2 H01072n=03=0mf=0 (2.2.6) H1 : at least one parameter is not zero The LRT is applied for this purpose. Define fl and O to be the estimates of the unknown parameters under H0 and H1, respectively. The LRT statistic can be calculated as LR = —2[log L(f~2|y) — log L(ff|y)] (2.2.7) 75 Let 0 = (#1 a2 a3 61 02 63 04 65)T = (pl [12 #3 072,, a; Ugnf 03 03)T E Q R3 x [0, 00) x [0, 00) x [0, 00) x (0, 00) x (0, 00) be the parameters to be estimated. Noted that the polygene variance is bounded away from zero if we assume there are more than one QTL in the genome. Let the true parameters under the null hypothesis be 90 = (#10 #20 #30 0an 0330 0,2,,f0 030 0301‘ = (#10 #20 #30 0 0 0 030 030)T E (20 = 1R3 x {0} x {0} x {0} x (0,00) x (0,00). The three tested genetic variance components under the null hypothesis lie on the boundaries of the parameter space 9. Thus, the standard conditions for obtaining the asymptotic X2 distribution of the LRT are not satisfied (Self and Liang 1987). Following the results from Chernoff (1954), Shapiro (1985) and Self & Liang (1987), the following theorem shows that the LR statistic follows a mixture chi-square distribution, whereby the mixture proportions depend on the estimated Fisher information matrix. Theorem 2.2.1. Let COO and C9 be closed convex cones with vertex at 00 to ap- proximate 90 and (2, respectively. Let Y be a random variable with a multivariate normal distribution with mean 90, and variance-covariance matrix I —1(90). Under the assumptions given in the Appendix, the LR statistic in (3.2.10) is asymptoti- cally distributed as a mixture chi-square distribution with the form W3X§ : ngg : lef : “0X32 where (.13 = 1%[27r — cos—1pm — cos—1pm — cos—1p23], w2 = 1%[37r—cos_1p12l3—cos_1p13l2 —cos_1p23|1], “’1 = 1%(003—1'012 +cos_1p13+ cos—1p23), and “’0 = %— 2117?[37r—cos—1p1213—cos_1p13|2—cos_1p23l1];pub is the correlation between the variance terms a and b calculated from the Fisher information (pub—panbc) (l—pgalflu—ngfl/Z' matrix, and pablcz 76 Note that the symbol 7r in the above theorem is the irrational number (a mathe— matical constant) not the IBD sharing probability. The proof of the theorem is given in Appendix. Remark: When the random parameter estimators are uncorrelated or the corre- lation is extremely small, i.e., the Fisher information matrix is close to diagonal, the 2 mixture proportions for the X 1: components are reduced to the binomial form with (2)2’3, which is consistent with the results (Case 9) given in Self and Liang (1987). Once a QTL is identified at a genomic position, we can further assess its imprinting property. To evaluate whether a QTL shows imprinting effect, the hypotheses can be formulated as H0 0}=072n (228) H1 02 75 0,2,, Again, the likelihood ratio test can be applied which asymptotically follows a x2 distribution with 1 degree of freedom since the tested parameter under the null is nonnegative and does not lie on the boundary of the parameter space. Rejecting H0 indicates genomic imprinting, and the QTL can be called an iQTL. We denote this imprinting test as LRq-mp. If the null is rejected, one would be interested in testing Whether the detected iQTL is completely maternally or paternally imprinted with the corresponding null hypothesis expressed as H0 : 072,, = O and H0 : a; = 0, respectively. The LRT statistic for the two tests asymptotically follows a mixture X2 distribution with the form 1X2 : 1X2. Rejection of complete imprinting indicates 2 0 9 1 partial imprinting. 77 Maternal effects can be tested by formulating hypothesis: H0 : #1 = #2 = u3. Note that these three parameters do not represent the true maternal effects as they are confounded with the main genetic effects. But a test of pairwise differences can be applied to detect the significance of any maternal contribution. 2.2.6 Multiple iQTL model In practice, there may be several QTLs to reflect the phenotypic variation in the whole genome. When testing QTL effects at one chromosome, the effects from QTLS located at other chromosomes are absorbed by the polygenic effect (9) In some case, two or more QTLs may located at the same chromosome, which are termed background QTLs in comparison to the tested one. When this happens, it is essential to adjust for the background QTLs’ effects. Otherwise, it may lead to biased estimation for the putative QTL caused by the interference of QTLS close to the tested interval (Zeng 1994). In the previous work of Li and Cui (2009a), the authors proposed a multiple iQTL model following the idea of next-to-fianking markers proposed by Xu and Atchley (1995). We adopted a similar strategy in the current study. Briefly, assume there are S (i)QTLs in one chromosome, the multiple iQTL model considering parent-specific allele effect can be expressed as S S yki =#k+ Zzakflti8+ Zakfis+9ki+ekiv k=1v" vK; i=1v“ an]; 3:1 3:1 where each (i)QTL effect is partitioned as two separate terms to reflect the contribu- 78 tion of the maternal and paternal alleles. In reality, the exact number and location of the QTLs in a chromosome is generally unknown before the genome-wide search. This problem can be eased by applying the next-to—flanking markers idea proposed by Xu and Atchley (1995). Denote a test interval with two flanking markers as M l—Mr. The markers next to these two markers are denoted as M L on the left of Ml, and M R on the right of Mr (L = l — 1 and R = r + 1). Conditional on the two markers, ML and MR, we expect the effects of QTLs located outside of the tested interval can be absorbed by the IBD values calculated from the two next-to—flanking markers (Xu and Atchley 1995). Thus, the calculation of (i)QTL covariance conditional on these two markers will avoid the requirement for the position of QTLS outside of the tested interval. The phenotypic covariance between two individuals i and j can be expressed as COW/kt ykjlflle’frimjm’ firm/if firifJ'f’TrRIk) 2 =21K ()QlL’Wle 0l2 +7rimjmam+7rim/Jflk072nf+7rzfjf0 f 1R r=1 2 :WLIkUL +7rimjm0m +7rim/jflk0mf +7rifi f of +7rleoR + Cbijag + Iij where 7r Ll k is the IBD sharing value at marker L, and 0% is a composite variance component which reflects the variation of (i)QTL effects on the left side of the tested interval (see Li and Cui 2009a for details). «RI 1; and 0%: are defined similarly. The 79 calculation of 7r Ll k and 7r Rl k reflect the triploid structure of the endosperm genome. Testing (i)QTL effects can then be focused on a tested interval while adjusting for the background QTLs’ effects located in other place. 2.3 SIMULATION 2.3.1 Single iQTL simulation Six evenly spaced markers are simulated with a total length of 100cM. For simplicity, we assume equal family size (i.e., "k = n). A putative iQTL is simulated at 48cM away from the first marker. The effect of the putative iQTL is simulated by assum- ing different imprinting modes (i.e., no imprinting, completely imprinting and partial imprinting). Phenotypes are simulated by randomly drawing multivariate normal dis- tribution with the covariance structure given in model (2.4) with different parameter combinations. 80 . H \Eb H \b H Eb wEEgnm 000 DEE 0E 30 :0 3 00008500 .2 .20.: . as p 0.00 . o m m . v M . v a M . 2 m L 2 585000.002 ..Mb H 2mm ” om 0:0 o H \E H Mb H :Mb ” or. 2500» .28 2020 H 09$ 02: 3 20802 Naotm 6:0 102m ”20mg 20 2000002 2 480 0:3 03H. "3:2 20 85 96% $003: 02: 80 200208 00.5 0% 80¢ $40 as 0002.320 908 0c: 3 00320000 2 HHG 2t 05 .20 285002 0:8 5030000002 5283 £000 .28 020 wfiamto 98 005800 00 2092:: 0:... 3 20802 as: 98 k: 0.00 Z $8.8 $88 82.8 38.8 38.8 $8 5.8 308 22.28 8.0 8.0 88 ES 2.2 Ed 223 has 23 8.2 8.2 at. $52 $28 63.8 338 A208 828 $08 80.8 308 $2.28 8.0 2.0 was 83 :3 R2 82 33 as 8.: 8.2 8.8. 8x8... :28 32.8 $8.8 $8.8 $28 30: A28 A2: A2028 :8 3 23 8.3 2.2 2.2 83 :3 2:: 8.2 2.2 5.8. ©me 308 A2,: 82.8 3:8 :58 A28 A228 908 22.28 2.0 S 2.0 23 22 a2: 52 23 2s 2.2 3.2 20.2. 02x... Nata 28:0 0300 as 2 2 2.8 m3 2 2 2 202. ”E x a: 0 m KS \ 8 m0 m0 m0 m0 Nb m1 m1 H1 nogmom 0000322038 E :03w 020 00208500 20008308 02: 80 0.820 @0523 5008 05 «o 3002 0:280 038 .mqwmmflo @5928 22020me 2095 000m0 washing: 0: 55 A80 0 .28 000002202 2830386 o2 :0 vows «080223.00 023080.208 000m0 v20 :oEmoa 480 0:» 00 384mm £0300 0:8 ”mm 0509 81 For experimental crosses, the number of families and the offspring size can be easily controlled. We simulate data assuming different family and offspring size com- binations to evaluate the effect of family and offspring size on testing power and parameter estimation. For a fixed total sample size of 400, we vary the family and offspring size with different combinations, i.e., 4x 100, 8x50, 20x20 and 100x4. The first number for each combination indicates the family size and the second number indicates the offspring size. Without loss of generality, we assume equal offspring size for all families in each simulation. Results with 100 Monte Carlo repetitions are recorded for each simulation. Table 2.2 tabulates the results assuming no imprinting (i.e., 0,2,, = 0%). The simulated parameter values are listed underneath each parameter. The REML esti- mates as well as the root mean squared errors (RMSEs) (given in the parenthesis) are recorded for each simulation. n F denotes the number of families and "k denotes the number of offsprings in each family. Overall, the 20 x 20 combination produces the smallest RMSE and bias for QTL position estimation, high QTL detection power and reasonable type I error rate among the four designs. The 100 x 4 design gives the most accurate estimates for the maternal effects, but with small power to detect QTL effect. The 4 x 100 design gives very biased parameter estimates for the main maternal and paternal variance terms. The 20x 20 design also produces the most reasonable imprinting type I error. Thus, a balance of the family and offspring size is necessary in achieving optimal power and estimation precision for the QTL position and genetic effects. In reality, one should always try to avoid designs with extremely 82 large or small family size. Since the 20 x 20 design outperforms the others, we focus this design and con- duct additional simulations under different imprinting mechanisms. The results are summarized in Table 2.3. Four imprinting action modes are assumed: complete pa— ternal imprinting (Ufa = 1.5 and o%=0), complete maternal imprinting (0,2,, = 0 and o%=1.5), partial maternal imprinting (0,2,, = 0.5 and o%=1) and partial paternal imprinting (0%, = 1 and U%=O'5)' Overall, all parameters can be properly estimated with reasonable precision under different scenarios. The complete maternal imprint- ing has the lowest overall QTL testing power (62%) compared to others. Since the majority of the total variance comes from the maternal alleles (two copies), this result is expected. Also noted that the imprinting power is low in the four cases. Since the size of the real data analyzed in section 4 is close to 400, we focus our simulation with a total size of 400. As the total sample size increases, we do observe increased imprinting power (data not shown). The imprinting power is listed in the last column of Table 2.3, which varies a lot under different imprinting cases. Note the imprinting power is calculated only when a simulated QTL is significant. Simulations are not counted when calculating the imprinting power when no QTLs are detected. Thus, we expect low imprinting power under the current simulation design given that the overall QTL detection power is less than 90%. The observed low imprinting power might be due to small sample size. When sample size is increased, we do observe increased imprinting power (data not shown). For more explanations, see the Discussion section. 83 020002220200 20230 00 2203022000020 .200 Na 0308 00m 0080.802 2 8.8 20023000: 020:3 22022000 02: 2.0 00000262200 >200 2 000» 3552082 025. Mb H 22mg 2 cm @2300... 200 20300 wax—E2025 02: 00 020002 203000 220300 2203000000 HBO 202025 023 00 2000.2 0300 602.8 2288 $38 83.8 2220.8 828 32.8 2222228 82.28 22.22 222.22 2.2 22:2 22:2 202; 2.; 222.2 22.22 3.2 2.2.2. 3 2 928 E28 :88 222.8 828 62.8 38.8 22.28 32.28 22.22 22.22 2; 2.22 8.22 2.22.22 23 3.2 22.2 22.2 2.2. 2 0o 2828 2228 £28 222.28 822.8 220.8 22.88 828 2222.28 one 2.22 22.2 8.22 222 202 22.22 2222 8.2 22.2 8.3 2.2 22 222228 22028 22228 22208 22.88 222.8 2222.8 22228 28.28 2.22 2.22 2: 2.2 2.22 2.22 23 2s 82 8.2 2.22 0 m2 203000 20300 N md \md 02. OH m2 m2 Eowv 0 m .E E m0 m0 m0 m0 m0 m1 m1 .3 22030000 00002022220200 5 53m 0220 02000220200 02: 00 0.20220 00.20200 22.00222 02: 00 3002 0.20200 0208 .swmmflo 92:08.00 omxom 023 20022: $00000 9252:2080 2.220.200.0020 22030200 HBO 0 200 002002002 2203002222200 on: 220 0023 0000222300 020002220200 000000 0220 22030000 480 02: 00 252mm ..20300 0:8 ”MN 0300... 84 In summary, the results show that both the 4x100 and the 100X4 designs yield lower QTL detection power and higher RMSE (root mean squared error) for QTL position estimation than the other two designs do. The 20x20 design slightly beats the 8x50 design with smaller imprinting type I error and higher QTL detection power. These results indicate that it is necessary to maintain a balance between the family size and the offspring size, in order to achieve optimal power and good effects estimation precision. For a given budget with a fixed total sample size, one should always try to avoid extreme designs with large (or small) number of families, each with small (or large) number of offsprings. Focusing on the 20 x 20 design, additional simulation shows that the performance of the imprinting model depends on the underlying degree of imprinting. High imprinting power is observed when an iQTL is maternally imprinting compared to the case when an iQTL is paternally imprinting. 2.3.2 Multiple iQTL simulation When multiple (i)QTLs occur in one chromosome, especially when they show linkage effects, the inference of a tested QTL will be biased if other QTLs’ effects are not corrected. In the simulation, the same setup as described in single iQTL simulation is adopted, except that two putative iQTLs are simulated, one located at 28cM and the other one located at 72cM. Data are simulated assuming two iQTLs located at the two genomic positions and are subject to both single iQTL and multiple iQTL analysis. Figure 2.2 plots the LR profiles averaged out of 100 replications for both analyses. 85 The dotted and solid curves represent the LR profiles calculated from the single and multiple iQTL models, respectively. Results indicate that the single iQTL analysis produces three clear LR peaks. The highest peak corresponds to the wrong QTL position, which is often termed as “ghost” QTL (Zeng 1994). On the contrary, the multiple iQTL model can correctly target the two QTL positions with high precision as indicated by two distinct LR peaks. 1° ' ' ' I ------- 'SingleI QTL model 9 - ----- multiple QTL model ‘ 8 L ... . . ..... - 7 _ - 6 _ - es s- ‘ 4 3.; 2 1 ¢ lr 4o 60 72 80 100 Test position 00 N O N (D Figure 2.2: The LR profile plot for the single iQTL and multiple iQTL analyses. The true iQTL positions are simulated at 28cM and 72cM (see the arrow signs). The dotted and the solid curves represent the LR profiles by the single iQTL and multiple iQTL analyses, respectively. In summary, the results indicate a clear benefit of analysis by fitting a multiple iQTL model than fitting a single iQTL model. While the single iQTl analysis detect one “ghost” QTL located between the two simulated QTLs, the multiple iQTL anal- 86 ysis can clearly separate the two QTLs with high precision. Note that the multiple iQTL analysis normally generates low LR values than the single iQTL analysis does. The distribution of the LR value under the multiple iQTL analysis is not clear, and permutation should be used to assess significance of any (i)QTLs in multiple iQTL analysis (Xu and Atchley 1995). 2.4 A CASE STUDY We apply our method to a real data set which have two endosperm traits of interests: mean ploidy level (denoted as Mploidy) and percentage of endoreduplicated nuclei (denoted as Endo). The two traits describe the level of endoreduplication in maize endosperm, which is thought to be genetically controlled by imprinted genes (Dilkes et al. 2002). Four backcross segregation populations, initiated with two inbred lines, Sg18 and M017, were sampled. The four populations were obtained from a reciprocal backcross design as illustrated in Table 2.1. The data show large degree of variation for endoreduplication among the four backcross populations, and ten linkage groups were constructed from the observed marker data (Coelho et al. 2007). For more details about the data, readers are referred to Coelho et al. (2007). The two traits were analyzed with our multiple iQTL model aimed to identify iQTLs across the ten linkage groups. 87 .0003 00 00000008 000 poem ._0 00 05000V 300% 0902:: 05 :0 000x008 00 08002000 00H. 803003 490 0:... 00 $32 000 000 02000080 98080000200 0:... 0000 005 00550 05 00 00000 05 00 w8080000t00 0005600 0m8000w 00H. 5030000000 S0832 0:0 008m 00005 03... 0:» 00m 030008... 0005080008080 000 000 0:: 000000-500 000 000000 0300080. 0005.08000w 05 000 0:: 00300 000 0200 #080800: 000 m0 53m 000 380 00 00000008 00... 98830 000 m030> 00000080. 008 50.500300 00380 00300 080 0:00 03 00000008 000 0005 306333 >063 :008 0:0 A0085 8030029008080 00 090800.000 0:0 00.“ 00503 a: 0005.08080w 00H. .AOHO. . . . LUV 09.0% 0w0xcz 0808 3 000 000000 0003 800000080 03... 0:0 $830008: EEO 00 00800008 003 @5000 000 Add 000.8 0000:0lew3 0:0 00 050.8 00H. ”MN 0.8me coEmon. 0:003. -_. .....- : 0., __ ._ _ .2: _ _ .1 .H ..I ”will .N .v ...................................................... 0 ... 20 00 00 ”B 00 00M ... :_ d j ll_.fi_fifi_ _ _ _-:_ . H .......... .N .v ................................................................................................... 0 >203: ....... 00 00 N0 85 6.0 88 Figures 2.3 plots the LR profiles across the ten linkage groups for the two traits. The solid and dotted curves represent LR profiles for traits Endo and Mploidy, re- spectively. To adjust for the genome-wide error rate across the entire linkage group, permutation tests are applied in which the critical threshold value is empirically calcu- lated on the basis of repeatedly shuffling the relationships between marker genotypes and phenotypes (Churchill and Doerge 1994). The corresponding genome-wide signif- icance thresholds (at 5% level) for the two traits are denoted by the horizontal solid (for Endo) and dotted (for Mploidy) lines. The 5% level chromosome-wide thresholds are denoted by the dashed (for Endo) and dash-dotted (for Mploidy) lines. QTLS that are significance at the chromosome-wide level are called suggestive QTLs. It can be seen that two QTLs (on G7 and G9) associated with Mploidy and one QTL (on G6) associated with Endo are detected at the 5% genome-wide significance level (de- noted by “*” in Table 3.3). Two suggestive QTLS (on G2 and G10) associated with Endo and one suggestive QTL (on G6) associated with Mploidy are also indetified. The detailed QTL location and effect estimates as well as the test results for im- printing are tabulated in Table 3.3. For the trait Mploidy, the identified three QTLs are all imprinted (pimp < 0.05) and all show completely maternal imprinting, i.e., the maternal copies do not express. They are thus termed iQTLs. The cytoplasmic maternal effect does not show any evidence of significance for all the three iQTLs (p M > 0.05). For the trait Endo, only the QTL detected on G6 shows imprinting effect (Pimp < 0.05) and it shows completely paternal imprinting (pi f < 0.05). The other two QTLS does not show evidence of imprinting (pimp > 0.05). For this trait, 89 significant maternal effects are detected (p M < 0.01). In our study, one maternally controlled iQTL was detected for trait Endo, which is consistent with the result given by Dilkes et al. (2002). l\r’leanwhile, according to the genetic conflict theory proposed by Haig and Westoby (1991), in which maternally derived alleles tend to trigger a negative effect on the increase of endosperm growth, whereas paternally derived alleles tend to play an opposite effect to increase seed size. The identified iQTLs shwoing maternal imprinting for trait Mploidy can be well explained by the genetic conflict theory. Both empirical evidence and theoretical hypothesis support the current finding. 90 $638000“: .3 H M0 u 3: 05000: 0003800 0:0 8 H 2&0 H om: w88€08m 00:00:08 000—0800 .AMb H 2mg n omv 00000 w80858w Ami U mi N H1 “ om: 000000 _050008 w80000 :00 0030?: 05 00 x: 0:0 8A 88$ .35 ......s .3 00000008 000 _0>0# 8000830 0005;080:0w 0:0 00 008000830 w83000 0.3.0 503000000: .3 0:0 0 .m 080008080 :0 20wH+SmOEE 0:0 20mv.mm+mvmw80 .3808: 000—008 00 000002 00 008m 00: :00 0490 00.80 00H. 50.500000: .9 0:0 N. .0 080008080 :0 2030+03H08: 0:0 000080 000008 .8308: 000108 00 000003 00 000082 .305 :00 0.5.0 00:: 0:5. - l and 56V omdm wad 5.0 o R mmA mod wad Even mm.m© wwdu *3 3.0 wad mod adv 50m mm; mad Q; EN o R QWN mafim wfimo Sumo o - - nod 3.0V 3.5 00m o R mad SN mwd mvd cwdm ovdo 3.3. .3 008m wvd Hmod mad mad mmd on; Ed D P. 90.0 «mod wvod 3.0 29m boa 2.0 o R 30 mmod mvod vmd mm.m mag mad o R :0 3.0 o a 8.3 0.0.2 0.0.2 0 0.0.0 00.0 20 0:0 2.: 0:: a 00.0 00.0 8.0 03 00.: 2.2 *0 006.02 a2: 0 m .2 q .58. x 8 R: ER .0. 35 m0 m0 m0 m0 m0 m0 m0 mi N: H: 0000.00 0:00:00 0000000 #050002 :0 £009. .A00:mv “03:: 00000200000005 0:0 .00 8000: 0:0 Q0883: >083 :008 ”080: 80000005 025 :00 8:80:88 00:08.5 0:: 0:0 000000 #050008 00:: 05 .80 00000800: 00008300 00H. Jam 0508 ml: 91 In the case study, we also fit a Mendelian model to the data to see if the imprinting model and the Mendelian model produce any different results. The Mendelian model for family It assumes the form yk=Xk5+ak+gk+ek, k=1,--—,K (2.4.1) where a k is a random vector for the main genetic effect without partitioning it into allelic specific effects. See model (2) in the main paper for an explanation of other parameters. Figures 2.4 and 2.5 plot the results for the two traits Mean Ploidy Level (Mploidy) and Percentage of Endoreduplication (Endo), respectively. Figure 2.4 indicates that the imprinting and the Mendelian models agree with two QTLs detected, one on G7 and one on G9. Both QTLs are significant at the genome-wide significant level by fitting the imprinting model. But the Mendelian model only detects the one on G7 at the genome-wide level. Each model detects one QTL at the chromosome-wide level on G6. But the two QTLs do not overlap. Further experimental investigation is needed to confirm which model is more robust for this QTL. The results for fitting the Endo trait is summarized in Figure 2.5. The imprinting model detects three QTLs, on G2, G6 and G10. The one on G6 is significant at the genome-wide level. The other two are only significant at the chromosome-wide level. In contrast, the Mendelian model only detects one QTL on G6 which overlaps with the one identified by the imprinting model. In fact, the two models produce quite similar LR values at QTLs on G2 and G10. Due to high threshold for the Mendelian 92 model, it fails to detect the two QTLs. 93 .AoHU. . . . .HOV 095% $003: 0000:: OH 05 000000 6.64 00:05 :002 £05 05 9002:0050 013.0 00 00:00.008 0:: m:3000 :0.“ Ed 0050: 0000:0007w2 05 .«0 050:: 2:. Jim 0::wE 0:23.000 9500... _ _ :JH __l_ _ L\_/_\ 0/_\ 20 00 00 B 00 00. _ _ _ /_\__ _ _ _ /_\_ i— _ 0 _ m 202L220: _ . «.0 m0 NO EET>2202 ll r0 94 0:0..000 05 00 0:030:00008 0:0:: :00 Va 0:030 00m .300. . . . .000 00:0:w $0000: 0000:: o: 0:: 000.80 :050020000:00:m 00 0m0p:00:0n0 30:0 05 w:0>.0:00:: 0480 00 005008 0:» 00:00.00» :00 0030> mg 00 00,003 0:; ”Wm 0.5.00.0 0:02.000 05000... -_- ___4__ __ :1 .2: __1 Ill . . .N .v 0 ... 9.0 m0 m0 NO m0 000 ___ : l0 :_ _ 0 _ _ ___ _ m II ....... II .N .v .0 Dams—loucw: v0 m0 N0 008.005 ll _.0 95 2.5 DISCUSSION The role of genomic imprinting in endosperm development has been commonly rec- ognized (Dilkes et a1. 2002; Kinkshita et a1 1999; Chaudhuri and Messing 1994). But little is known about the exact location and effect size of imprinted genes in endosperm. As endosperm in cereal provides the most nutrition for human being, it is important to identify imprinted genes that govern seed development, particularly endosperm development. In this article, we develop a variance components linkage analysis method with an experimental cross design, aimed to identify iQTLs for en- dosperm development. Our method is motivated by real applications and is evaluated through Monte Carlo simulations. The proposed method is based on a particular genetic design (reciprocal backcross design) with inbreeding populations. We treat iQTL effects as random, different from a fixed—effect iQTL model (e.g., Cui 2007). Variance components linkage analysis with partial inbreeding human population was previously proposed (see Abney et a1. 2000). However, extending the VC model to a completely inbreeding population is challeng- ing. In our previous work, we proposed a VG—based iQTL mapping framework for an inbreeding diploid mapping population (Li and Cui 2009a). Extending the previous work, we propose a novel IBD partitioning approach to calculate allelic sharing in an inbreeding endosperm population. Extension to mapping multiple iQTLs is provided. Simulations indicate good performance of the multiple iQTL analysis compared to a single iQTL model. Meanwhile, to obtain a good balance of iQTL position and effect estimation and detection power, we have to avoid extreme sample designs. For a fixed 96 total sample size, extremely large or small families should always be avoided. In an application to two endosperm traits, we identified three iQTLs for trait Mploidy. All show paternal expression. We also identified one iQTL for trait Endo, which shows a maternal expression. According to the parental conflict theory pro— posed by Haig and Westoby (1991), maternally derived alleles trigger a negative effect on endosperm cell growth and inhibit endosperm development because the extra ma- ternal copy could slower nuclear division in endosperm. On the contrary, paternally derived alleles tend to increase seed size. Thus, the three iQTL identified for Mploidy can be explained by the genetic conflict theory. The occurrence of parental conflict theory explains parent-of-origin effects as an ubiquitous mechanism for the control of early seed development (Grossniklaus et al. 2001; Kinoshita et al. 1999). In a VC-based linkage analysis, likelihood ratio test (LRT) has been commonly applied in assessing QTL significance. The LRT statistic asymptotically follows a 2 distribution and many investigators often apply the result (Case 9) in mixture x Self and Liang (1987) with binomial mixture coefficients. In a recent investigation, we found that the LRT in a regular VC-based linkage analysis without considering imprinting follows a mixture x2 distribution with mixture proportions depending on the estimated Fisher information matrix (Li and Cui 2009b). The modified calcula- tion of mixture proportion does give more reasonable type I error rate than the one with binomial coefficients. When imprinting is considered, we show that the limiting distribution of the LRT also follows a mixture X2 distribution, and we adopt the new criterion for power evaluation. Simulations show that the new criterion gives 97 type I error more closer to the nominal level than the one using binomial coefficients, and produces power as good as the later one (data not shown). We recommend investigators to adopt the new criterion in their analysis. Increasing evidences have suggested that for correlated traits, multivariate ap- proaches can increase the power and precision to identify genetic effects in genetic linkage analyses (e.g., Boomsma and Dolan 1998; Amos et al. 2001; Evans 2002). Also, the joint analysis of multivariate traits can provide a platform for testing a num- ber of biologically interesting hypotheses, such as testing pleiotropic effects of QTL, testing pleiotropic vs close linkage. Moreover, if the putative QTL has pleiotropic effects on several traits, the joint analysis may perform better than mapping each trait separately (Jiang and Zeng 1995). Multivariate traits appear frequently in ge- netic mapping studies. For example, the two endosperm traits evaluate in this study are highly correlated (Colho et al. 2006). We expect joint analysis may provide high mapping resolution and power for iQTL detection. This will be explored in our future investigation. A computer code written in R is available upon request. APPENDIX In standard human linkage analysis with variance components model, many authors declare that the likelihood ratio statistic follows a mixture X2 distribution with bino- mial coefficient for each mixture component (e.g., Amos and Andrade 2001; Hanson et a1. 2001; Shete et al. 2003). Following Chernoff (1954), Shapiro (1985) and Self & Liang (1987), in the following we show that the mixture proportion actually depends 98 on the estimated Fisher information matrix. For a random variable Y with density function f (y; 0), following Chernoff (1954) and Self & Liang (1987), assume that: ( 1) When any true parameter (60) is on the boundary, the neighborhood centered at 90, i.e., (00 -— 6, 00 + 6), is closed, and the intersection between this closure and Q is also a closed set. (2) The first three derivatives of 2,- log f (312'; 0) with respect to 0 on the intersec- . . . , 63 log f tion of neighborhoods of 90 and Q almost surely ex1st. Moreover, I _ . k I < K (y) z .7 for all 9 on the intersection, and E9 [K (31)] < 00. (3) The information matrix 1(0) is positive definite on neighborhoods of 90. Assuming the above assumptions, the consistency and weak convergence of the estimators can be proven (see Chernoff 1954, Self & Liang 1987, Shapiro 1985). Here we cite the main results from Chernoff (1954), Shapiro (1985) and Self & Liang (1987) to show the asymptotic distribution of the LRT in our case. Defining two closed polyhedral convex cones C90 and C91 to approximate 90 and 521 at 90. The parameter space under the null hypothesis is approximated as C90 = {0 : 0 6 R3 x {0} x {0} x {0} x (0,00) x (0, 00)}, against C91 = {0 : 0 E R3 x [0,00) x [0, 00) x [0,00) x (0,00) x (0, 00)} under the alternative. Following Chernoff (1954, Theorem 1), the asymptotic distribution of the LRT in (3.2.10) is equivalent to the following quadratic approximation LR* = m f (Y — 0)’1(90)(Y — a) — m f (Y — 9)’I(00)(Y — a) (2.5.1) 96090 OECQI 99 ‘11.: vwx where Y ~ N(60, I’1(60)). Subtracting 00 from Y and 0, the expression in (2.5.1) is given by LR*= inf (Y—0)’I(00)(Y—9)— inf (Y—O)’I(00)(Y—0) (2.5.2) 06090 ——90 06091—90 and Y N N(0, I _1(60)) under the linear transformation. Let 01 = (091 — 00) n (090 — 90)C = {a : 91 > 0,92 > 0,93 > 0}, which is a closed polyhedral convex cone with 3 dimensions. By the Pythagoras theorem, the statistic in (2.5.2) can be expressed as LR* = m f (Y — 0)’I(00)(Y — a) (2.5.3) 9601 Let f(C'l-) represent the set of all faces of C1, and let C'it0 = {7 E R3 : 7’0 5 0, V 0 6 Cf} be a polar cone which is also a polyhedral convex cone such that (Cio)0 = CI. Following Shapiro (1985), we can select a face 1/ E f(Ci) correspond- ing to a polar face V0 E .7: (C10) such that the linear spaces generated by V and V0 are orthogonal to each other. For one face V (or V0), a projection Ty (or TVO) (a symmetric idempotent matrix giving projection onto the space generated by V (or V0» can be found and T V = I —T 1,0 since they are orthogonal. Then TVY (or TVOY) is a projection of Y onto Cit (or C10). For a given Y, let g(Y) be the minimizer to achieve the infimum in (2.5.3), such that LR* = (Y — g(Y))’I(90)(Y — g(Y)). Define ¢V|Y = {Y E R3 : g(Y) E V}) so that g(Y) E u if and only if TVY E Cit 100 and TVOY 6 C10. By Shapiro (1985), g(Y)= TVY 6 Cf, V Y E ¢V|Y‘ Note that the set 0 Y is composed of 23 disjoint sets in R3. All these disjoint VI sets can be classified into four categories as 1). $21le = {Y;Y1 > 0.3/2 > 0, Y3 > 0,g(Y) e u} 2). = {Y;Y1 > O,Y2 > 0, Y3 g 0,g(Y) e u}; $le = {Y;Y1 > 0,Y2 l/\ 2 IpulY 0, Y3 > 0,g(Y) e u}; 03'}, = {Y;Y1 g 0,Y2 > 0,Y3 > 0,g(Y) e u} |/\ 3). 1&le = {Y;Y1 S 0,Y2 S 0,Y3 > 0,9(Y) E V}; ”(1)le = {Y;Y1 > 0,Y2 0,Y3 g 0,g(Y) e V};1/1;|Y = {Y;Y1 g 0,Y2 > 0,Y3 g 0,g(Y) e u} 4). 0le = {Y;Y1 g 0,Y2 g 0,1/3 g 0,9(Y) e u} Define C* = {0* : 0* = Al/QP’ON 0 E C1} to be also a polyhedral closed convex cone. Then 2.5.3 can be further expressed as LR* = in f Hz — 9*“? (2.5.4) 0*EC* where z=A1/2P’Y (PAPT = I (90)) has a multivariate normal distribution with mean 0 and identify covariance matrix. Let C*0 be a polar cone of C* and (CW0)0 = C*. Two faces V* and V*0 can be defined with respect to .7: (C *) and f (C *0). The relevant orthogonal projections Tusk *0 and Tu*0 corresponding to 11* and U can be found. Suppose h(z) is the minimizer 101 to achieve the infimum in (2.5.4). Following Shapiro (1985), we can have h(z)= Tuzkz E (1*, V z E (pl/:12. The set Illuaklz is defined similarly as wz/IY’ and it satisfies the conditions of Lemma 3.1 (Shapiro 1985). Then we have LR* = ”2450)“? = ||z—TV*z||2 = z’(I—TV*)z = z’TV*0z v z e wu*lz (2.5.5) Thus the distribution of LR* in 2.5.3 can be expressed as 3 Pr(LR* > c2) =Pr((Y -— g(Y))’1(00)(Y - g(Y)) > 62, Y e E) 0i”) 23 . 2:1 = 2:221 Pr(Y e leY) (2.5.6) Pray — g>’I(0o) CZIY e wily) 23 :2; Pr(Y e ¢£|Y)Pr(lel/*Oz > c2|z 6 $1202) where conditional on 2 E wffllz’ z'TV*0z is a chi-square distribution. By Bayes’ the- orem, the distribution of LR* follows a mixture chi-square distribution with mixing . 3 . proportions Pr(Y E wIZ/IY) (i=1,...,23) and 212:1 P'r(Y E wily) = 1. The calculation of the mixture proportions follows Plackett (1954). Specifically, when Y E wily, LR* ~ X3: and the corresponding mixture proportion 1133 =Pr(Y G 1 2 wll/IY) =ZIIEI2” — cos—1pm — cos—1p13 — cos- p23]. For category (2), LR* ~ X2 for Y E wzi/IY’ 2' = 2,3,4 with the corresponding mixture probability calculated . 1 _ _ _ by 102 = 231:2 Pr(Y E wf/IY)=ZE[3W — cos 1p1213 - cos 1p13|2 - cos 1p23l1]. 102 Correspondingly, LR* ~ x? for Y E wily, i = 5, 6, 7 with the corresponding mixture probability calculated as “’1 = 2;:5 Pr(Y E ¢£IY)=% — 1113 in category (3). For the last category, LR* ~ X3 for Y E 212le with the mixture probability “’0 = Pr(Y E wle)=% — 102. 103 Chapter 3 Bivariate quantitative trait linkage analysis in mapping imprinted quantitative trait loci underlying endosperm traits in flowering plant 3.1 Introduction The availability of multiple phenotypic traits allows researchers to associate genetic effects with the joint information of multivariate traits. Comparing with a univariate phenotypic trait, multivariate traits can provide more information in explaining the variation resulted from few particular genes or QTL, especially when correlations of these traits are observed to measure the related levels of multivariate phenotypes. By 104 accurately modeling the relative correlations of different phenotypes, the multivariate traits analysis significantly improves the power to detect QTL, and the degree of accuracy of position estimation for true QTL (Williams et al. 1999; Almasy et al. 1997) Initiated by the double fertilization, a unique reproductive process in angiosperm plants, endosperm is developed from the fuse of the two polar nuclei and a sperm cell, ended up with a triploid tissue with two identical chromosomes inherited from maternal and one chromosome from paternal parent. Surrounding the embryo, the endosperm supplies main nutrition to the embryo (Brink and Cooper 1947). In ce- real, it serves as the major source of food for human being. But the function of the endosperm is far more complicated and is beyond simple nutrient delivery to the embryo. Meanwhile, it is frequent to observe multivariate endosperm traits in maize, for instance, two highly correlated maize endosperm traits were collected: endoredu- plication and mean ploidy (Cintia et al. 2006). To reveal the association of genetic effects with the variation of correlated endosperm traits, the multivariate traits anal- ysis provides an essential channel to extract the maximum information to identify important genes or QTL. In terms of the association of gene expression with variations of the phenotype, genomic imprinting is defined as the epigenetic phenomena that cause uniparental gene expression (Wolf et al. 2008). Under genomic imprinting, the expression of the same allele A from different heterozygote genotypes Aa and aA depends on the origin of inheritance of this allele. Then the maternally derived allele A (from Aa) functions 105 differently from that of paternally derived allele A (inherited from aA). A number of studies have illustrated that many endosperm traits are controlled by genomic im- printing. In maize, several paternal imprinting genes have been identified: the 1‘ gene in the regulation of anthocyanin (Kermicle 1970), the seed storage protein regulatory gene dsrl (Chaudhuri and Messing 1994), the MEA gene in seed development (Ki- noshita et al. 1999) and some a-tubulin genes (Lund et a1 1995). In the contrary direction, endoreduplication expresses a maternally controlled parent-of-origin effect (Dilkes et al. 2002). Endoreduplication, commonly occurring in angiosperms, is cru- cial for the endosperm development. By amplifying the genome to result in larger cells, endoreduplicaiton plays a critical role in process about the terminal differenti- ation and specialized function of given tissues. However, to our best knowledge, no study has been conducted to map iQTL with multivariate traits underlying potential imprinting process. It is the purpose of this study to develop an efficient multivariate iQTL mapping procedure incorporating the nature of the imprinting characteristic. One important merit of the multivariate traits analysis is to make a number of biologically interesting hypotheses tests, such as testing pleiotropic effects of (i)QTL, testing pleiotropic against close linkage. These tests can not be accomplished under a univariate trait analysis. Generally, one phenotype is affected by one gene, but in a few cases, the same gene may govern several phenotypes simultaneously. This unique phenomenon is termed as pleiotropy. In real experiments, this special event may be confused with close linkage, another exceptional phenomenon during which variations of several phenotypic traits are influenced by multiple closely linked genes. Although 106 several genes are located closely in close linkage event, each gene only controls one trait. The discrepancy between pleiotropy and close linkage is simply distinguished by the number of traits one gene could control. It is practically important in distin- guishing these two phenomena. In maize, some vital genes displaying pleiotropic effects are revealed. For example, maize zfl regulatory genes in genetic backgrounds have pleiotripic effects on structure traits in branching and inflorescence (Bomblies and Doebley 2006); the tbl gene with the intergenic sequences illustrates the pleiotropic effects on maize morphology (Clark et a1. 2006); the early phase change (epc) gene has effects on maize development in several aspects (Vega et a1. 2002); a maize gene GLOSSYl (GL1) expresses its effects on trichome size and cutin structure during epidermis development (Sturaro et al. 2005); encoding with a transcription factor, a maize gene Glossy15 (Gl15) functioning like APETALA2 gene controls the development of ovule and identity of floral organ (Moose and Sisco 1996). It is known that endoreduplication and mean ploidy are two highly correlated endosperm traits in maize (Cintia et al. 2006). The identification of genes with pleiotropic effect based on these two phenotypic traits is practically important. Variance components model is a powerful tool in multi—trait linkage analysis for an outbred or human population (Almasy et al. 1997; Williams et al. 1999). Due to the special inbreeding structure and unique genetic make-up of the endosperm genome, the current multi-trait linkage analysis methods can not be applied directly to en- dosperm phenotypes. In an extension to our previously proposed variance components 107 model in mapping iQTL underlying endosperm trait, in this work we will propose a bivariate iQTL mapping method to track down iQTL with possible pleiotropic effect, and to further distinguish potential close linkage signals. 3.2 Statistical method 3.2.1 The model We will follow the same genetic design as illustrated in Chapter 2. Let ylk = (y11,...,y1nk)T be a vector of the lst phenotypic trait value in the kth family, and y2k = (3,21, ..., y2nk )T be the 2nd phenotype within the same family. We assume multivariate normality for the joint distribution of y1 k and yzk. In the kth family, "k individuals are randomly selected for each quantitative trait. The total K families are collected through four distinct reciprocal backcross populations. In bivariate trait analysis, the genotype-specific cytoplasmic maternal effect (BI, )62), additive genetic effect at the QTL (a1 k’ azk), polygenic additive effect (91k, 92k)’ and random en- vironmental effect (elk’ 62k) are considered. The parent-of-origin effect is further partitioned into effects due to the expression of the maternal allele with respect to each phenotypic trait (denoted as almk’ a2mk)’ and due to the expression of the pa- ternal allele (denoted as al I: f’ ). Hence, the genetic model underlying bivariate endosperm traits is represented in a vector form: 108 (y1k,y2k)= Xz(fi1,fi2)+2(a1mk,02mkl + (aifkfisz) + (91ka92k) +(81kve2k) (3.2.1) where k = 1, - -- ,K. According to the reciprocal backcross design, three mater- nal genotypes AA, Aa and cm are observed, thus ,31 and ,32 denote mean pa- rameters of two phenotypic traits with respect to three maternal genotypes, i.e., :31 = (p1,p2,u3), fig = (p4,p5,p6). The design matrix X; is an "k x 3 ma- trix with one column of ones and two columns of zeros. The random effects cor- responding to the lst trait are almk’ and 51 k’ and each of these ran- alfk’ 91k dom components is distributed as normal distribution i.e., alm k ~ N(O,Hmlk0m1), a1 ~ N(0 II 02) 91 ~ N(0 (1)1302 ) and 51 ~ N(0 [1:02 ) where 0.2 fk ’ flk f1’ k ’ 91 k ’ 61’ ml and 0%1 are the additive genetic variances at the QTL for maternal and paternal sides respectively; Hmlk and Hflk are IBD sharing matrices that are derived from the similarities of maternal and paternal alleles among siblings, respectively; 031 and 031 are the additive polygenic variance and the residual environmental vari- ance, respectively; (Pk is the expected proportion of alleles shared IBD; and 1k is the identity matrix. Correspondingly, a2mk’ and 62k are random effects a2fk ’ 92k with normal distribution for the 2nd phenotypic trait i.e., 02ml; ~ N (0, Hml k072712)’ N 2 N 2 N 2 a2fk N(O’Hf|kaf2)’g2k N(0,kog2)and62k N(0,Ikae2). In addition, when bivariate phenotypic traits are involved in the model, the covariances of two phenotypes are expressed in terms of each random effect, i.e., 109 Cov(a1mk,a2mk) =Hm|k0m12 together With COV(a1fk’a2fk):Hflka12 are the covariances of the additive genetic effects at QTL; the covariance of the polygenic effects is Cov(glk,ggk)=<1>k0912; the covariance of the environmental effects is Cov(e1k,62k)=lk0812. 3.2.2 Parent-specific allele sharing & genomewide linkage scan The variance components model is built upon the basis of IBD sharing at the QTL. In triploid inbreeding population, a unique decomposition of parent-specific allele sharing pattern is illustrated in Figure 2.1. In the kth backcross family, the pheno- typic variance-covariance corresponding to the lst phenotype is denoted as: 21k = Hmlkagnl + Hm/flkarznfl + Hflkafl + lekagl + 1021, where nm/flk is the IBD sharing matrix that the shared alleles are derived from different parents. Sim- ilarly, the phenotypic variance—covariance for the 2nd phenotype is given as 22k = Hmlkagn2 +Hm/flk072nf2 +Hflkafi2 + (Pd/€032 +1032. The covariance of twophe- notypic traits is expressed as 212k = Hm|k0m12 + nm/flkamflz + HflkafIZ + Ql k0912 + 10612. Therefore, the phenotypic variance—covariance of two phenotypic traits within the kth backcross family is expressed in a matrix form: 2 2: 2k: 1k 121‘ (3.2.2) >312k 22k Where _ 2 2 2 2 2 ' 21k ~ nmlkaml + Hm/flkamfi + Hflkafi + ‘I’glkagi + 1061 110 ° 312k = Hm|k0m12 + Hm/flkamflg + 11f|k°f12 + ‘I’glk0912 + I”€12 __ 2 2 2 2 2 0 22k — Hm|k0m2 + Hm/flkame + HflkafZ + ‘I’QIkagz + 1062 The calculation of the IBD sharing probability is based on the marker positions. Unless each marker interval is dense, the QTL may be anywhere in the interval bracketed by two flanking markers. To acquire the accurate position of QTL, we need to search the putative QTL at every 1 or 2 cM throughout the entire genome (see Chapter 2 for more details). 3.2.3 Likelihood function and parameter estimation In the kth family, two phenotype vectors are expressed as 311 k = (y11,...,y1nk)T _ T _ T - and y2k — (3)21, ...,ygnk) . Let yk—(y11,...,y1nk, y21,...,y2nk) . Assuming the multivariate normality of yk and different families are independent, the overall log likelihood function is given by: K e: 2 100003.20] (3.23) k=1 where B is a mean vector of both phenotypes in terms of ,61 (denoted as the mean vector of the lst phenotype) and ,82 (as the mean vector for the 2nd phenotype) i.e., (6(6 x1) = (g; ). With respect to three maternal genotypes, BI and 32 include three different mean values for each trait. The random parameters in 2k are of main interest and defined as E) = (072721, 0m12’ 072,12, 0%, 0 f1 2, 03,2, 012nf1’ am f1 2, 072"].2, 031, 0912, 032, 031, 0612, 032). To estimate these parameters, two commonly used 111 approaches are applied, the maximum likelihood (ML) method and the restricted maximum likelihood (REML) method. 3.2.3.1 The ML estimation Defined the parameters as Q = (3,6) where 0:021, #2, p3, p4, [15, (‘6) and __ 2 2 2 2 2 2 2 2 2 9—(0m1, 0m12,0m2,0f1, 01:12, af2’ amfl, 0mf12, Umfg’ 091, 0912, 092, 061, 0812, 032). The log-likelihood function to be maximized is in the form: NIH K (*(m = 2100mm»: — K Z {10% IEkl + (yk - Xkfi)'2k(yk — Xkfi)} k=1 k=l (324) where yk=(y1k,y2k)T is the phenotypic vector for both phenotypes; 2k is the variance-covariance matrix of yk with dimension 272k x 271k, and the elements of this matrix are explained in section 3.2.3; the mean effect of the kth backcross family “kulnk is denoted as X k6 = , and X k is a design matrix. #kplnk We applied the Fisher scoring algorithm to estimate the parameters given in (2, 52(0) 30 9(t+1) = e“) +1(e(t))—1 left) 112 Define H(1)k= Hmlk 0 matrix 0 with dimension n x n ,., k = 1, ..., K mkl o 0 k k (2) 0 0 . . . . _ Hm|k= ( Hmlk) matrix 0 With dimensmn "k x nk, k —— 1, ..., K (3.2.5) H (3) __ 0 mlk . . . . _ nmlk — Hmlk O ) matrix 0 With dimensmn ”k x nk, k — 1, ...,K E! Replacing the matrix Hmlk in the above equation (3.2.5) by Hflk’ Hmflk’ (bk, ' and [k individually, we will obtain matrices H_(f|39’ H(83‘|k’ (S S) and 11:3) ,==s 1, 2 3, k=1,...,K. The first-derivative of the log-likelihood function 5* with respective to each vari- ; ance component is given by 82* K 1 —1 (1 ) TA —1 (1 )g A 602 = —5 gym, H mIk) — (yk- XkB) 2,, n mIk 2,; (yk — 29.5)), ml k=1 66* K 1 A —1 (2 ) T 1 (2 ) A 802 = ‘5 2057(2), H mlk k”) (yk— XkB) 2k H mlkzk 1(yk -Xk5)), m2 k=1 6€* K 1 A —1 (3) T —1 (3 )2 A : —— 2 H X 2k H _ X 7 30mm 2 [CZ—:16“ k mlk) (yk— k5) "1|ka 1(yk km) 813* K _ 1 1 (1) T 1 (1 ) A 502 “‘5 2“ (2k Hflk)_ (3’19— XkB) anf|k2k1(yk-Xkfi))’ f [:21 113 66* 1 K ..___ ‘—1 (2) TA _1 (2) _ . 60% — 23022,“ ank)— (yI— XIB) 2k HfIkZI:1(yk X113», 66* 1 K . . . . - 30 = ‘5 Z-31-ka7331H1m}II2;1(yk—Xk6)). I mfg k=1 66* 1 K . —-- (WE—111” )-(yk—Xkfi)T271II() 21(yI— XIB» 00mf12 2; k flit It: flk k 66* K 1 1 () A T 1 ()A—1 3031__§E__:1(tr(2k 1k )_(yk"XkB) 2,, I SI (“-2961 66* 1 K A 1 () T 1 () 1 3032_—§k§1(tr(2k (bk )‘(yk’ka 2k (bk 2 (yk—Xkfi», 66* 1 K ___ 1 () 1 ()A—1 _ 66912 2121("(2k (bk 1 (yk Xkfi) ‘1’}, 2k (yk X15», 66* 1 K . ._ 802 =-§Z(t (>3 1( ))-(yk XIB)TZ 11£)Zk1(yk—Xk6)), 61 k=1 66* 1 K 1 ) I () _ a 2 "5211715319 1;. >- 2,, 1,, BI (yr-X13», 082 k=1 66* K 1 1 ( ) T ( )A—1 , 00612——2l§1(tr(zk 1k )—(yk_XkB) 2}, [I 2k (YL—Xkdfl Taking the expectation of the negative 2nd derivative of log—likelihood function with respect to 9“), we obtain the Fisher information matrix (I(9(S ) and 11(5) (s,r=1, 2, 3; The 1st derivative of the log-likelihood function 8* (3.2.7) with respective to each variance component is given by M 1 3 T A (1) A 80%” = TEE“ ((tT PT 110 ‘3’7‘ PTHmIrPTYT), 8” 1 3 (2) II(2 > = —— t (P 11m — P 66* 3 _ _1 3) _ T A (3) A, 50mm _ 2 :szlur P 11 1r) ,. PerITPyyr), 66* 3 1 11(1) yAT 11(1) = —— 1 P11 ,.P11 P 00% 272(7( 1‘ fl?“ ) 7 flT TY7) 117 3 = —';‘ Z (tT(13rH-(f2I2~) — yszH;2I:PTyT)a 00%? r=1 0€* 3 . . . Tofu = ‘312' 2:1(tr(PTH)’3|Z*) “ YTPTH(f3I,2,PTYr)1 r: a” 1 31101110 002.11 ”Em M11‘ mfl Pry” 06* —— 1 3 t PRO) TP 11(2) P aegan‘figm ’" m11r1‘ 1‘ " mflr ”’1’ 03* — 13 t PII(3) TPHH P as "52M ’" mflr) ’" 7' mfl TyT)’ 8€* 3 = ‘1 2(17(P7“1’1(Al))— YTPrq’( )PTYT) 802 2 91 7‘21 06* 3 = —1 Z(tT(Prq)(2))"—yTPr¢1-q)2( )P'ryT), 802 2 92 r=1 (98* 3 1 A (3) ) _‘ —- (tT(Pr(I) ) Prq) Pry'r) 08* 3 1 A 1 002 = -5 :(tr(P7~II(. 1)— yTPrI,(.11P yr) 1‘31 r=1 06* 3 = _1 Z(t:~(P,.I£21)— yTP7 1,1212%), 802 2 ‘32 r=1 (96* 3 1 A ( ) yT (3 1 00612 _ 720413-171 )— Pr] Pm) 118 The Fisher information matrix (Z(G(t))) in the REML procedure is obtained by taking the expectation of the negative 2nd derivative of log-likelihood function with respect to each variance component. The REML estimator of ('3 is the generalized least squares estimator, that is, 6 = (XT2—1X)—1XTf:—1Y 3.2.4 Hypothesis testing In the bivariate traits analysis, the existence of quantitative trait loci (QTL) is tested by the following hypothesis .2_2_2_2_2_2_2_2_2 _ Ho‘aml—a"12_0m12—0f1_0f2—0f12_0mf1_Ome—Umflg—O H1 : H0 is not true (3.2.9) The significance of the above test is assessed through the likelihood ratio test (LRT). Let f2 and 6 be estimates of the unknown parameters with respect to H0 and H1, respectively, then the likelihood ratio statistic is evaluated by, LR = —2[log L(§|y) — log L(§|y)] (3.2.10) which, under the null hypothesis, is distributed with a mixture chi-square distribu- . . (6) 2(6) 2 1(6) 24(6) 2 3(6) 23(6)+1(6) 2 tion wrth the form -2%'X93—2%TX7I Sag—XGZEEg—XS: 5563—X4zL3—7265—‘LX3: (6) 3(‘362 2. (‘3’) 2 L3; X2- 2 X1“ 2. X0- 119 Once a QTL is identified at a genomic position, its imprinting property for each phenotypic trait is assessed by the following two imprinting hypotheses formulated by, H0 : 0&1 = 0,2”1 (3.2.11) H1 0% # 072,11 and H0 0&2 = 072,12 H1 : 0’25 75 072,12 Again, the likelihood ratio test is applied and the test statistic (denoted as LRimp) follows a chi-square distribution with 1 degree of freedom. If the null is rejected at the tested QTL position, imprinting effect is claimed. We further assess whether the imprinted genetic effect is completely derived from the maternal allele or from the paternal allele. Two hypotheses are formulated for this purpose to assess completely maternally imprinting by H0:0,2nt=0 t=1,2 (3.2.12) H1 :azntgo t=1, 2 and to assess completely paternally imprinting by H22=0 t=12 00ft , H: 2 0 t=1,2 laffié 120 The likelihood ratio test statistic (LRcimp) corresponding to the above tests follows a mixture chi-square distribution with %X% : %X8- If the test in (3.2.9) is rejected, we can further test if the QTL controls the lst trait by testing .2 _ 2 _ 2 _ Ho'aml—Ufl—Um'fro (3213) H1 : H0 is not true or the 2nd trait by testing 2 _ .2 _ 2 _ H0.am2—af2—0mf2— H1 : H0 is not true The likelihood ratio statistic corresponding to the above tests is denoted as LRplei and follows a mixture chi—square distribution under the null with the form 21—17?[27r — cos-1pm—cos-1p13—cos-1p23jxg: 31;[37r—cos—1p12l3—cos—1p13l2—cos_1p23l1])(3: 1 21%(c03_1p12 +cos—1p13 +003- p23)x% [% — 3%(37r —cos_1p12l3 —cos_1p13I2 - cos—1p23l1)]xg where the correlation between the variance terms a and b is calcu- (flab—Pacpbc) (1—p3a1/20-pgcfl/2 lated from the Fisher information matrix, and pubic: (see Chapter 2 for details). Rejecting the null for the above two tests indicates pleiotropic effect (i.e., one gene has effect on two traits). But if two genes are closely linked at the detected QTL (i.e., close linkage), we still get the same testing result. To further distinguish close linkage against pleiotropic effect, we develop the following two tests 121 H0 : pm12 = pf12 = pmf12 :1 (3.2.14) H1 : H0 is not true for testing pleiotropic effect and Hotpm12=pf =p =0 12 "If 12 (3.2.15) H1 : H0 is not true for testing close linkage. The null hypothesis in test (3.2.14) indicates that the additive effects for the two traits are perfectly correlated and they are possibly controlled by a single gene. On the contrary, the null hypothesis in test (3.2.15) indicates two closely linkage genes at one QTL location. The likelihood ratio test is denoted by LRp for test (3.2.14) and LRC for test (3.2.14). The null distribution of LRp has a mixture chi-square distribution (since 1 is a boundary point for correlation p) with the form 4%[27r — cos—1pm — cos—1,013 — cos—1p23]xg: 21-177[37r — cos—1p12l3 — cos—1p13l2 — COS—lpggllngi 1%(cos—lp12+cos_1p13+c03_1p23)x%: [%—211;(37r—c03_1p12|3— COS—IP13I2 — COS—IP23|1)JX(2)- The null distribution of LRC follows a classical chi-square distribution with 3 degrees of freedom, i.e., LRC ~ X3- We can also test the maternal main effect on each trait by 122 H0 = #1 = #2 = #3 (3.2.16) H1 : H0 is not true for the lst trait, and H0=M4=H5=I16 H1 : H0 is not true for the 2nd trait. 3.3 Simulation 3.3. 1 Simulation design A simulation study was conducted to evaluate the performance of the proposed method. Five equally-spaced markers (M1 — M5) are simulated for one linkage group assuming a backcross design. This linkage group covers a length of 40cM with lOcM for each marker interval. Haldane map function is used to convert map distance to recombination rate. Assume one QTL is at 22cM away from the first marker, and has effects on two phenotypic traits. Phenotypic values of both traits are generated from a multivariate normal distribution with variance-covariance given in (3.2.2) in terms of different parameter settings. Backcross families are simulated following the structure of the real data described in Chapter 2 (i.e., 4 families with 100 sibs within each family). In each simulation scenario, the IBD value of any two siblings is evalu- ated at every 2cM along the linkage group. The REML method is adopted to estimate 123 unknown parameters of interests, and 100 simulation replicates are recorded. 3.3.2 Results The simulated results without imprinting effect (i.e., 0,2,,”1 = 0% and 0,2712 2 0&2) are tabulated in Table 3.1. Estimations of the QTL position, observed statistical power and type I errors are compared between bivariate traits analysis (T1+T2) and each univariate trait analysis (T1 and T2). Overall, the bivariate traits analysis gives more precise QTL position estimate, larger statistical power and reasonable type I error rate. The results indicate that the joint mapping incorporating bivariate phenotypic information can capture information of QTLs with small or moderate effects that could be easily missed by univariate trait analysis. In addition, the bivariate trait analysis provides less biased parameter estimates for additive QTL variance terms 2 derived from maternal and paternal parents. For example, variance term for Uml is estimated as 0.428(SMSE=0.11) for the joint analysis, while it is 0.25 (SMSE=O.44) for the single trait analysis. 124 .cosflafim £95.39. coHuanSmHv EoEQEo ofi wEmz 9522538 mH “mason ”20mm pd H3882 2 Hub wsb 27H. Xmas: 2o OH; 95% %§:: 2: Ho Sign $5 23 82H A20 :3 802$me 92: on“ \3 Hoofluomow mH APO m5 m0 $8582 251 :0 948 22s H89 so: 6%: m nod wed some HRH mmmd mmmd dem mod m £39. an: 6”,: Gauge :2: S20 ome wand Hmvd ommd 86H Sad H flair 33v 5.: 33 :53 five :33 92.8 3.8 awsv mod owd mob. mmHN mmwd wmnd mmud Exo 53¢ wmvd mem m 338 + H HES. note 530m m. mH md H H mad ad ad Eomm mm was: Hog»? me HMb NRQMb «Mb mimab szmko HMb Hsmro :oBHmom EHHESHSHH «35352550 .wommficmEQ E 53w 2d mmpaEEmm HomeSwa 23 H0 @858 @93de E88 2: Ho 308 298m 2H9 demeo 02 x Ht Eva: shame wEHEEEH on firs ‘HBG .m 8H 89.85%." :oHHESEHm 2: no woman, woumfismm aoEHmom Q80 23 Ho 335m ..826m BEL ”H.m wEfiH Table 3.1 also shows that the results for trait 1 is better than trait 2, due to high heritability (H2 = 0.20) for T1 than that (H2 = 0.05) for T2. When the heritability for both traits are increased, we observe better performance (data not shown). The type I error for the bivariate traits analysis and single trait with T2 is reasonably controlled. The type I error for T1 is a little inflated. But the joint analysis gives much larger power than that for both single trait analysis. To demonstrate the imprinting property of an iQTL in the bivariate traits anal- ysis, additional simulations under different imprinting mechanisms are conducted and the results are listed in Table 3.2. We design four imprinting modes such as: 2 __1 a; :0 )7 ( 1 2 )), complete ma- . . , ”ml— complete paternal imprinting (( 2 0m222 Uf2— 2 _ 2 =1 ”ml—0 ) (afl ternal imprinting (( 2 2 )), partial maternal imprinting 0m :0 U = 2 2 f2 0,2711 =0,25 af1=0.75 ( , ( )), and partial paternal imprinting 2 07,12 =O.5 0f2=1.5 0,2,, =0.75 012, =0.25 1 i ( 1 )). With respect to the imprinting test 2 2 0 =1.5 0 =0.5 m2 f2 (3.2.11), the largest imprinting power is achieved by complete maternal (paternal) imprinting for both phenotypic traits (T1 and T2). No significant difference in power (Power) under different imprinting mechanisms is observed for the joint analysis. The low power for the imprinting test for single trait T2 is due to its low heritability. 126 .Hoouoohmn E 8.33 mmmmfioan 32H? ~85an 9: pa Homeoswsoo 3:0 2 $3 massing: SHE $628?ng “NH. cha H rH. HES .ch Mb H :mb U or. wcswou 8H Egon 9555:: m5 3 ~82 muck/On: .HSBOnHH UmBOQ :oHHoBmHV 483 2.396 2: 8 £33 8.50% $03Seaxaesxaotswag:as: awe Ea sue awesooxmwoxzsxame some one was ems wom.emwwo.a as.” name was 2: 83 gas as; Es ease and ammo 5s mead 8.8 as 3 a2 a; A29amigoxfiavA3238 Sag 828 see see awe39452820233 9.0.3 as :3 as Ememsma $3 ”Se «2 35 ES 53 53 was $3. $3 33 none 23 a2: 2 as a; a2 A33£328.8ng$35.:sag :2: see 33V aweaeoxaaizesae A28 and mad as Hmoeiwaa 83 £3 at?” awed SS 52 :3 Sod as; :25? 33 as $8 o N o a H28swineoxmeéAmmazawe80.: 3.0.3 awe:ae386:3:33:30:83A35 and :s Es 833mg $3 $3. am? 82 was $2 and mono Sod 83 was was $3 $2 a o a o maaaaafieaaaaaaea a. S a m S 3 as no as sea mob «Hob Hob Nab mHsib Hmbmmeb NHKEbmeEbmmHkbmmHEbmmxbmmEbmebNHEbsoSHmOnH m m m m m m m .mommficmama E 23% was mawumfifiwa «35 Ho 288 88:3 538 05 m0 308 Bang 23. .cmemHo wEEEam oonv map .895 maowmo washing: ..anmmzo 9:39? 490 a SH mopaozmm: con—£25m 2: no woman HogaEHHmo mumpoSaBa page Home .8560; HBO 2t Ho 32mm Jonson Bit “Wm 22.3. 127 3.4 Real Data Analysis Empirical study shows that imprinted genes affect variations of maize endosperm traits (Dilkes et al. 2002). Two endosperm traits, mean ploidy level (denoted as ploidy) and percentage of endoreduplicated nuclei (denoted as endo), are studied. Four backcross segregation populations are generated from two inbred line (Sg18 and M017). The details of this genomic data were explained by Coelho et al (2007), and the imprinting effect analyzed with the univariate trait analysis was reported in Chapter 2. To examine the pleiotropic effect of the imprinted genes, we conducted a joint analysis. LR profiles across ten linkage groups in bivariate traits analysis (endo+ploidy) and univariate trait analysis (endo and ploidy) are plotted in figure 3.1. The genome—wide significance threshold (horizontal dotted line, at 5% level) is determined by permu- tations based on repeatedly shuffling the relationship between marker genotypes and phenotypes (Churchill and Doerge 1994). Six QTL are detected at the 5% genome- wide significance level on G2, G4, G6, G9 and G10. In contrast with previous QTL detected in the univariate trait analysis (see Chapter 2), more QT L are detected in the bivariate joint analysis. 128 .370. ....HUV museum "twain: 388 OH on» $085 $6: 5598me 93 on... wfibuovcs 390 we oozofixo one wzsmoa H8 E5 moss coofifiéTwE one no 0505 2.3L ”fin oesmi mcoEmoa mczwoh 1.1.1 ................... oncm «>3 l mm 129 Emersoommmu A0 H 0Q M 0Q H 3. n 0*: @0905 85065.8 98 2 M 0Q H No H S u 0*: Soto 2003063 .m J“; A0 H No u 0»: Regan 329:8 A0 H “two N 0m: 0539?: 853.2: 829:8 Ammo H :mo ” 0.03 woomo 0255380 0:58”. .80 m03¢>d 23 Ed 395 was 83% Ewan EEK .SENR {*3 .3 08865 98 752 0326:5200 93 S 8:800:06 05397. 39.0 .0H 05 0 .0 .v .0 2:89:85 am 0882 8a owcm 05... 02an $8: 95% 80 3.00 50 ”802 30000.000.005.050-30 3.0 00.0- 00.0 mvémodmmdmdd 00.0 00.0 0.6001022 *0H 9&0 000000000 8.0 H04 5.0 00.0- 00 00A 3.000.030 0H0 00.0 000 0 0 R 000.030.0300 3.00 00.000000A0v0000 001v 00.0- 2.0 300 00Amv0 EA 3.0 00.0Eo0w.fim+0_:£ *0 30.0 050.0 0v.00 300003000000 00A 2.0 00.0 00.H00.0 00.050. 004 3.0 $3000+oas *v 00.50vm.000.000.0~.0.0-:.0 00A 30. 3.0 00000.0 3000.0 $00 00.0 Eo00.ToE£ 0 £00 3.0 0.030000004 00.0 3.0- 00.0 00000050000 00.0 00.0 230.7250 m Elooafiowaa 05$ CS NEEHENR ammo 05.0 #030 «mo ago HMb «Remomfibefixzmo meme Mme mfibmfiEbFMb muoowo “5280 coEmom :0 .Aowcmv Eon: 03820502005 on» g5 28qu was en 5 a a 5 0 SEE ”muse: 8898 so Bazwfi 052 .5 mucocoafioo oocwtg o a no 98888.3 0365mm 2 H .. o .m 0.02 1.0 . 0 . n .0 :0 0 . E0038 130 Table 3.3 lists the QTL location, variance components estimation, and test out- comes for imprinting and pleiotropic effects. In the bivariate analysis, the identified QTL on G6 is imprinted for T2 (pimpg < 0.05) but not for T1 pimpl > 0.05). Fur- ther test shows that this QTL shows completely paternal imprinting on T 2. From the parameter estimation, it is clear that correlations of genetic variance for two pheno- typic traits are strong, and further tests to detect pleiotropic effects vs close linkage are meaningful in the bivariate traits analysis. Results show that two iQTLs on G4 and G6 indicate strong pleiotropic effects (pplez' > 0.05, poo—in < 0.05). In our study, multivariate approaches for genetic linkage analysis increase the power and precision to identify genetic effect, especially when a QTL has pleiotropic effect on several traits. In accordance with the finding about the strong correlation between endoreduplication and mean ploidy in maize endosperm (Cintia et al. 2006), the pleiotropic effects of iQTLs on two endosperm traits are detected in our analysis. As shown in the simulation study, the joint analysis provides larger power and res- olution for iQTL detection compared to the single trait analysis, which explains the additional QTLs detected by the joint analysis. 3.5 Discussion A number of studies have shown that for correlated traits, multivariate approaches for genetic linkage analysis can increase the power and precision to identify genetic effects (Evans 2002), especially when a QTL has the pleiotropic effect on several traits (Jiang and Zeng 1995). Considering the importance of imprinted genes in 131 endosperm development and the relative merit of multi-trait analysis, we developed a bivariate variance components model based on a reciprocal backcross design to identify imprinted QTLS while incorporating the special genetic make-up of triploid inbreeding population. Both simulation and real data analysis show the efficiency of the approach. In simulation studies, we compared the outcomes of bivariate traits analysis with those of univariate trait analyses. The bivariate trait analysis can greatly improve the performance in QTL position estimation, testing power, and type I error rate. Simulation study also shows that when a trait has low heritability (i.e., T2), the joint analysis can also identify the gene by borrowing information from other traits with high heritability (i.e., T1) given that they are correlated. Our results are consistent with other multivariate traits studies (e.g., Jiang and Zeng 1995; Almasy et a1. 1997; Williams et a1. 1999). We applied our joint model to a real data set with two highly correlated endosperm traits, i.e., endoreduplication and mean ploidy. Six QTLS are detected on G1, G2, G4, G6, G8, G10 across the genome. One maternally imprinted QTL on G6 for T2 and three paternally imprinted QTLS on G4, G6, G8 for T1 are also identified. The results of the imprinting tests are consistent with that of univariate trait analysis and can be explained by the genetic conflict theory proposed by Haig and Westoby (1991). Comparing with univariate trait analysis, additional QTLS are mapped in the bivariate joint analysis. These additional QTLs are those showing small genetic effect in the single trait analysis. This demonstrates the power of the joint analysis 132 for correlated traits. Another advantage of the joint analysis is the test of pleiotropic effect and close linkage. We proposed a set of hypothesis tests to detect the existence of QTLs and genomic imprinting in bivariate traits analysis, and moreover, to distinguish the pleiotropic and close linkage effect. For the real data, one iQTL on G6 displays a strong pleiotropic effect, which controls both the endoreduplication trait and the mean ploidy trait (Table 3.3). Our method provides a testable framework in iQTL mapping with multivariate traits. 133 Chapter 4 Assessing statistical significance in genetic linkage analysis with the ' variance components model 4. 1 Introduction Variance components (VC) model is a powerful tool for quantitative trait loci (QTL) mapping in human linkage analysis. In a VC analysis, genetic effects are often par- titioned as additive, dominance and polygene effects whereby each one is treated as random. Thus, we are interested in testing whether the variance of a genetic effect is significantly deviated from zero. Likelihood ratio test (LRT) is often applied for the the testing purpose. Due to irregular conditions (i.e., parameter boundary problem), the asymptotic distribution of the LRT does not follow a regular chi—square distribu- 134 tion, rather a mixture X2 distribution, where the mixture proportions are calculated with standard binomial coefficients, a special case in Self and Liang (1987). A number of studies have showed the asymptotic distribution of LRT under irreg- ular conditions, see for example, Self and Liang (1987), Chernoff (1954) and Shapiro (1988). Chernoff (1954) showed that the limiting distribution of the LRT has a mix- ture chi-square distribution when parameters of interest are on one side of a hyper- plane, or in the first quadrant within a R2 space. Self and Liang (1987) extended the Chernoff’s comment to boundary cases. In linkage analysis with variance components model, the result displayed in case 9 in Self and Liang (1987) has been commonly applied for a threshold determination (e.g., Amos 1994; Hanson et al. 2001). This re- sult is based on the assumption of a diagonal variance-covariance matrix of unknown parameters. In reality, this assumption could be easily violated. This consequently leads to conservative hypothesis tests (e.g., Allison et al. 1999). For a bivariate linkage analysis, Amos et a1. (2001) proposed an approach to approximate the null distribution of the LRT (see section 4.2 for more details). But, their derivation assumes a diagonal Fisher information matrix. Moreover, they assume that the genetic correlation between two traits is either positively correlated (p = 1) or negatively correlated (p = —1). This is unrealistic in reality. Corresponding to the VC model, Morris et al. (2009) define the constrained likelihood ratio test (CLRT) with respect to this model. They try to apply Geyer’s regularity (1994) to show the asymptotic distribution of the constrained CLRT, but can not make sure that the global M-maximizer is definitely attained. Because of this limitation, a new simulation 135 method is developed. However, it is quite difficult to express the predominance of this method comparing with others. In this chapter we rigorously show that the LRT statistic in testing variance com- ponents in linkage analysis follows a mixture chi-square distribution and the mixture proportions depend on the estimated Fisher information matrix. The rest of this chapter is organized as follows. Section 4.2 introduce three classical VC models with both uniVariate and multivariate trait analysis. The main result is illustrated in section 4.3. Section 4.4 shows the performance of the new approximation by a few simulation examples. 4.2 Motivating models 4.2.1 Model I Assume K families are collected and the phenotype for the kth family is denoted by yk with ”k offsprings. Under the variance components model mapping framework, the genetic effect is partitioned into several components expressed as yk =u+ak+dk+gk+ek (4.2.1) where u is the overall mean; ak ~ N(0,ag) and dk ~ N(0,o(21) are the additive and dominant effect of a genetic variant; 9}; N N (0, 03) is the polygenic effect (i.e., the effect of QTLs not located on the same chromosome as the tested one); and e k ~ N (0, 0g) is the residual term. When a testing QTL is not on a marker position, 136 I the variance-covariance of the phenotype for a pair of sibpairs ykz’ and ykj in the kth family can be expressed as 2 2 2 2 COV(yki,ykj|7Tz'j,k+ 1 12 (8’11: 2 2 2 ykg 0a12 Caz 0912 092 ”612 ”62 where (8) is the kronecker product; 0am, 0912, and 0612 are the covariances between the additive, polygene and the residual effects for the two traits, respectively. All the others are defined similarly as in Scenario 1. The hypothesis test to detect major gene under a bivariate model is formulated H0 : ”(21.1 = 032 = 0012 =0 (4.2.4) H1 :031 >00rag2 >0 Under the alternative, when either one of the variance terms is zero, the covariance term is restricted to zero. 4.2.3 Model III Now consider the above bivariate trait model, but assuming both additive and dom- inant effect. The variance component model will be changed to 139 (yk1,yk2)=(#1,/12)+(ak1,ak2)+(dk1,dk2)+(gk1,gk2)+(€k1,€k2), k =1? ' ° ' ,K (4.2.5) where (dk1,dk2) is the random dominant effect of major gene at the quantitative trait locus for two phenotypic traits. The variance-covariance matrix between two sibs is changed to 2 2 y a 0’0 0 0d CO’U kl = a1 12 ®Hk+ d1 12 ®Ak 2 2 ykg ”“12 002 0d12 0d2 2 2 0' 0'9 0'8 06 91 12 ®00r051>00r0d2>0 H1 2031 >00r 0,12 Similarly, under the alternative, when either one of the variance terms (additive or dominant) is zero, the corresponding covariance term is restricted to zero. 140 4.3 Main results For a random sample X1, X2, ..., Xn of size n with a common density function fix, 0), let 0 = (01,02, ..., 6m)T E Q be the parameter vector, and 00 be the true population n parameter vector. Let ((0) = 2 log f (3,, 0) be the log-likelihood function. i=1 Assumption 1. Following C'hernofir (1954), the following assumptions are assumed: o I. for every 0 E G where G is a closure neighborhood centered at 60, the first three derivatives of 3(9) with respect to 0 exist for almost all 2:. 2 0 II. for every 0 E G, I8“? I and lgyéégll are bounded by a finite integrable i i j 3 function K(:z:), and! a £6 kI < K(:n) where E[K(:I:)] < 00. i ] ~ 0 III. The information matrix M (Mij =—-71—,E(6€(0. 66(6 )) is nonsingular for 0 3 .7 E G, and “M” < 00. Proposition 1. Under Assumption 1, there is a vector ég in Q, such that 9; ——) 00 . _ 1 in probability, and (Hg — 00) = Op(n 2). Proof: In terms of the arguments of Lehmann and Casella (1998), it is possible to search a sequence of points 6g in the closed set G about 00 to locally maximize 13(0). Following Lemma 1 in Chernoff (1954), the Vii-consistency of 0} can be proved. Denote a local maximum estimator by 0}. Since the regularity conditions of Chernoff (1954) on the parameter set only derive the asymptotic distribution based on a global maximum estimator (Geyer 1994; Shapiro 2000), additional conditions are required to achieve the asymptotic equivalence of local estimators. 141 Assumption 2. Considering more conditions as follows: 0 IV. ég and 6g are \/r_i-consistent Optimizers. 0 V. the parameter set 9 is a nearly convex set at 90. 0 VI. Condition vi in Theorem 3.2 (Shapiro, 2000). . - _ 1 Proposition 2. If the above two assumptions (1 and 2) are satisfied, 0g - 0g =op(n 2). Proof: See the proof of Theorem 3.2 in Shapiro (2000). In brief, two key steps are involved. First, the parameter set is nearly convex at 00. Comparing with convexity, near convexity is a loose condition. In particular, near convexity can be achieved by some smooth constraints in real application. When the fitted function is mono- tonically nondecreasing and twice continuously differentiable on a given interval, the parameter set is nearly convex at 00 under the Mangasarian-Fromovitz constraints. Next, Lipschitz continuous function Fn(ég) and Fn(ég) are defined as minus yli time log-likelihood function in terms of ég and 0}, respectively. When Fn(é§) and Fn(0~§) satisfy conditions 3.8 and 3.9 of Theorem 3.2 in Shapiro (2000), the asymptotic equiv- alence of 0} and 0} is achieved by the property of the near convexity (condition A in Shapiro 2000). It is well known that a cone contains several desirable properties that may simplify the optimization problem. According to the arguments of Chernoff ( 1954) and Self and Liang ( 1987), a cone is defined as 142 Definition 1. The set Q C Rm is approximated by a cone C9 at 00, if inf Hs—tll = 0(llt—00H) for all t E Q; inf ||s—tl| = o(||s—00H) for all s 6 C9 .9ng tea Note that the cone 09 is positively homogeneous if 3 6 C9, C(s — 90) + 00 6 C9 when CZO. Moreover, CQ — 00 with vertex at the origin is acquired by translating the cone C9 with vertex at 00. Thus, 0 can be approximated by a closed convex cone C9 with vertex at 00. Proposition 3. When 0 = 0, F is the distribution of the MLE ég based on one observation Y with the population distribution N(0,A«I—1) where 0 6 C9 — 00. If all previous conditions hold, n?(ég — 00) weakly converges to F, a multivariate normal distribution with mean zero and covariance matrix M _1. Proof: see the proof of Theorem 2 in Self and Liang (1987). Assumption 3. VII. Let C90 and C91 be two closed convex cones with vertex at 00 to approximate (20 and 91. Then C90 — 00 with vertex at origin is also a closed convex cone by translating C90 at 90. Theorem 4.3.1. If above Assumptions 1-3 hold and when 0:00, the large sample distribution of the likelihood ratio (LR) is the same as that of the test 0 6 C90 against 0 E CQI based on one observation Y generated from population distribution N(0,M—1). Moreover, the LR is distributed as a mixture chi-square distributions with the form: 143 1(<1) 2 Ii 2 2 P LR = E P Y ’ P T( > C ) i=1 T( E ul/J-IY) T(XT(TV*O) > C l where Pr(Y E W ) is the mixing proportion corresponding to the chi-square u-LIY . 1(c1) 2. components with i§1Pr(Y E wuily) = 1, and 7(TV*0)=rarih(TV*0). Proof: Two arguments need to be proved. First, the LR can be approximated as the difference of two quadratic forms with respect to 00 and 91 (see Theorem 1 in Chernoff 1954). Follow the fi-consistency of the optimizer and the property of the approximating cone, the large sample distribution of the LR is the same as that of testing 9 6 C90 against 0 6 C91. Next, we prove that LR asymptotically follows a mixture chi-square distribution. Following Chernoff (1954, Theorem 1), the asymptotic distribution of the LR is equivalent to the following quadratic approximation LR = inf (Y — 0)’M(Y — 9) — inf (Y — e)’M(Y — 0) (4.3.1) 06090 96091 Where Y ~ N (0, M _1). Subtracting 00 from Y and 0, we get an equivalent form of 4.3.1 LR = inf (Y — 0)’M(Y — 0) — inf (Y — 0)’M(Y -— 0) (4.3.2) QECQO—BO 96091—90 with Y ~ N (0, .M—l) and [VI is the Fisher information matrix. Let CV 2 (C91 — 00)fl(CQO — 00)‘L, where (CQO — 00)‘L is the orthogonal 144 complement of (Cg0 — 00). Following the Pythagoras theorem, the statistic LR (in 4.3.2) can be expressed as LR = inf (Y — 0)’M(Y — a) (4.3.3) 060V It can be seen that CV is also a closed polyhedral convex cone with q (q S m) dimension because CV is the intersection of convex cones. Thus a polar cone C V0 is defined as CV0 = {7 6 anlg S 0, V 0 6 CV}, and (CV0)0 = CV by the basic property of the polar cone. Let lF(CV) represent the set of all faces of CV. Following Shapiro (1985), we can select a face UV 6 lF(CV) corresponding to a polar face VV0 6 IF(CVO) such that the V0 V linear spaces generated by UV and I/ are orthogonal to each other. For one face 12 (or VVO), we can find a projection Tl/V (or TVVO) (a symmetric idempotent matrix giving projection onto the space generated by UV (or l/VO)) and TVV =I-TVVO since they are orthogonal. Then TVVY (or Tuon) is a projection of a random vector Y onto CV (or CV0). For a given Y, let g(Y)=(gl(Y),gg(Y), ...,gq(Y))T be the minimizer to achieve the infimum in (4.3.3). Define $11le = {Y 6 Big : g(Y) 6 UV} so that g(Y) 6 UV if and only if TVVY 6 CV and TVVOY E CV0. By Shapiro(1985), ¢VV|Y can also be defined by the inequalities as $12le = {Y E 33‘] : e’TVVY S 0,e E CV0,f’TVV0Y S 0,f 6 CV}. Thus, g(Y): TVVY 6 CV, for all Y 6 wz/VlY' Consequently, the likelihood ratio statistic in (4.3.3) is expressed as: LR = (Y — g(Y))’ M (Y — g(Y)) for all Y e “(’21le (4.3.4) 145 Note that the set $11le is composed of several almost disjoint sets wriVIY’i = 1, ..., l (q) The total number of these disjoint subsets (l(q)) are counted by the general form of binomial theorem, i.e., l (q) = 2q—0, q is the number of parameters in CV and o is the number of covariance terms in CV. Moreover, All these subsets are classified into q — o + 1 categories. To display these subsets, we start from the simple case that no covariance term is in 7,111/le (o = 0). The subsets of $11le are given as the following table 4.1. 146 l I l l A A»; 93 w CC: v as..- .o v £6 v Smhniwms 9; w E: w es..- 5 A 9 .o A SST. I . 1 A 9; w Case A as .o v as o A S<$uh>ms 147 l . A CC 3; w case A es.-. 5 A 95 v SSTEXS \~ g 3.: 9s m CC; A as.-. .o A a; A :mfinhfs 93 w can; u s "huhis pofigdz muwmfizm umm .253 823.550 505.9» \w_>\§ m0 @335 oEEmom 20:3 BEL #6 28¢. When a covariance term occurs in CV, we use one simple case to describe the relation among the parameters. For example: CV = {0;01 > 0,02 > 0,63 E R}, where 01 is a variance term for trait one, 02 is the variance term for trait 2, 63 is the covariance between the two traits. Because of the definition of covariance, 63 occurs only when 91 > O and 62 > 0, so 03 is represented by 631(61 > 0,62 > 0). In the corresponding way, the estimator of 03 is denoted as Y3I(Y1 > 0, Y2 > 0), and Y1 and Y2 are estimators of variances for two traits. In accordance with this constrain, the set $12le is denoted as 1,121/le = {Y;Yz- E R,i E qV\oV,YjI(YJ-_2 > O,Yj_1 > 0) E R,j E 0V,g(Y) E VV}, where set qV is defined as qv = {1,2,...,q} and 0V is denoted as a subset of qV, and is shown as oV = {3, 6, ..., q} = {3kv, kv = 1, 2, ..., §}. Considering the property of covariance term, the partition of 1,12”le is not related to the covariance term. Thus the whole subsets under this constrained condition are shown as the table 4.2. 148 I . .I III a I i so. i a .II. a I a i a A Ami? 9; w C235 vH 5s 5 v 5 5s. 5v as 5v es 5v 9 5v Shuwgms i i a i n..p a a a a a a \N fswccsgovH 5» 5AN 5». emf 5A5» 5A9 sons 5A9 5A Shuaflwes . l \N. 9; w 3535 w £5 A Tss5 A are»... 5 A £5 v £55 w 95 A 95 A sthuifs l . . A 9; w C335 w f5 A Te» 5 A «If..- .e w 5s 5 A as 5 A £5 v 9 5 A Sinhfis l , \~ Ash; 93 w C335 w as 5 A T5» 5 A «If.-. .z w 5s 5 A ms 5 A £5 A 95 v Shuxims . l a l a..." a a a a a a A Asm5Z>$§vsgwf5AH foAm 5» swcscAmsoAfzwms5Am>5A£>Th>Hs pofiafids muvwfifim $8.5... 8:23.50 5:5 \n_>\~$ m0 333% mEmmmom 20:3 BC. ”was Bash. 149 The general form of the whole number of subsets is l( q) = 2q—0, and these subsets l((1) . consist of q — o + 1 groups. Consequently, ¢VV|Y= U wi/WY' i=1 Considering a linear transformation on Y and 0, a new closed convex cone C* is defined as C*={0V; E%D’0, 0 E C‘L}, where DED’ = M, and a new random vector Z (Z =E 2 D, Y) is distributed with multivariate normal distribution with mean zero and an identity covariance. In terms of this random vector Z the likelihood ratio LR (in 4.3.3) is evaluated equivalently as: LR = inf uz — 0*”? (4.3.5) 6*eC* In the same way, 0* is a closed convex cone and C *0 is denoted as the polar cone of 0* with (CW)O = 0*. So there is a face u* E IF(C*) (or u*0 E lF(C*0) ) such that a symmetric idempotent matrix TV* (or T V*0) giving projection onto the space 0) generated by 11* (or 11* is defined. The linear transformation of Y to Z guarantees that there also exists a minimizer denoted by d(Z)=(d1(Z),d1(Z), ...,dq(Z))Tfor (4.3.5), in which d(Z) = TV*Z E C*, V Z E ¢V*|Z’ where ¢V*|Z can be defined by a linear transformation from . wl/‘LIY Note that the set wVJicl Z is also a polyhedral convex cone by its definition and satisfies the conditions of Lemma 3.1 (Shapiro 1985), and TV*0 is an symmetric idempotent matrix corresponding to face V*0, then the likelihood ratio statistic (4.3.5) 150 is written: LR = ||Z-d(Z)||2 = uz-cr,,.zn2 = z’(1—r,,.)z = z’rV,.Oz, for all z e 212,,“ 2 (4.3.6) It is clear that the minimum value of LR obtained for Y E wu-LIY (in 4.3.4) is equivalent to the infimum value of LR obtained for Z E tbl/ail Z (in 4.3.6). Note that the set illuakl Z is also made up of several almost disjoint sets, i.e., l(q) . . tbl/*I z = 2.911%)“le Condition on Z E wit/*l Z’ LR follows a chi-square distribution with rank(TV*0)=rank (I-TV*) degrees of freedom. By Bayes’ theorem, the distribu- tion of LR (in 4.3.6) is derived to be a mixture chi-square distribution. To control the significance of hypothesis test in a level, The probability that LR rejects the null hypothesis under the null condition is evaluated. Given a positive number c2 > 0 and a random vector Y, the expression of this probability is written Pr(LR > c2) =Pr((Y — g(Y))’M(Y — g(Y)) > c2,Y e wVJ-IY) l(q) (4.3.7) =Pr((Y — g(Y))’M(Y — g(Y)) > c2,Y e U lbixilY) i=1 Applying the Distributive law of sets and the union rule for these almost disjoint sets, the representation of (4.3.7) is changed to be: 151 l(q) . Pr(LR > c2) =Pr( U {(Y —— g(Y))’M(Y — g(Y)) > c2,Y e wit/HY» i=1 l(q) . = 2 MW —— g(Y))’M(Y — g(Y)) > c2, Y e willy) i=1 (4.3.8) 1(9?) . zizzl Pr(Y E wV-LlY) Pr((Y — g(Y>)’M(Y — g(Y)) > c2lY e (if/HY) According to the resemblance between the result in (4.3.4) and comment in (4.3.6), the representation of the probability is changed to be: l(cz) . . Pr(LR > c2) = :1 Pr(Y e ¢ZL|Y)PT(Z’TV*0Z > c2|Z 6 413,1 2) 2: (4.3.9) I = i3 Pr(Y e 42' )Pr(x2 > c2) i=1 u-LIY “Tl/>1: ) where Pr(Y E W J'IY) is the mixing proportion corresponding to the chi-square V 1(q) . . 2 _ _ . ‘ _ components With 1:2 1Pr(Y E qu-IY) — 1, and r(TV*0)—rank(TV*0). This com pletes the proof. Following Theorem 4.3.1, we now evaluate the distribution of the LR statistic for the three models in linkage analysis we mentioned in Section 4.2. Model I: The parameters of this model are given as 9 = {01,62,03,64,65} = “1,03,05,0303}, and the approximating cone under the null hypothesis is defined 152 as CQO={6;61 E R62 = 0,63 = 0,64 > 0,65 > 0} against CQI={6;61 E R62 > 0,63 > 0,64 > 0,65 > 0} under the alternative. The number of parameters to be tested for q is 2 and for o is 0, that is, there is no covariance term in model I. Thus wVV|Y consists of 2q—0:22=4 almost disjoint sets with q - 0 + 1 = 2 — 0 + 1 = 3 categories: (1)4»;le = {Y;Y1 > 0.3/2 > My) 6 VV}; (ii) $13le = {Y;Y1 > 0,Y2 < 0,g(y) E VV}, 43le = {Yo/1 s 0.1/2 > 0,9(y) e W}; (iii) 243le = WM 3 0.1/2 3 0,9(y) e W}- In the same way, tbl/*I Z can be divided into four almost disjoint subsets by linear transformation. When Y E $1 VVIY’ LR = z’ryoz -_—- 212+ 23 ~ x3 where Z ~ N (0, I), and the corresponding mixture proportion is estimated by Pr(Y E wivly). As Y is in the 2nd category (i.e., Y E wzle, i=2,3), LR ~ x? with the correspond- . . . . 2 3 ing mixmg proportion calculated by Pr(Y E z/JVV IY)+Pr(Y E tbl/V IY). For the last ~ 2 4 . . . . 4 category, LR X0 for Y E ¢VV|Y, and the mixmg proportion 18 Pr(Y E wVVIY). The calculation of the mixing proportion follows Plackett(1954) or Kendall(1941). —1 3 . . 7T—COS SpeCifically, Pr(Y e ‘blVlYl = 27: 912, i: Pr(Y e wzvly) = .33, and 1 .=2 4 __ cos" p . . ‘ . . Pr(Y E ibVVIY) — __27r—12’ and p12 is the correlation between estimator of addi- tive gene effect and that of dominance effect. Finally, the distribution is approximated as —1 — 1 Pr(LR > c2) = " CO2: p12P 02) + 5190,? > 02) (4.3.10) 153 Model II: For model II (in (42.3)), only the random additive effect of QTL (a k 1 , ak2) for each trait is considered. The parameters of additive major gene effect are denoted as: 031,032, 0012 where 0a12 is the covariance term between two traits. Similarly, two covariance terms are denoted for polygene effect and random residual effect. All parameters in this model are defined as: 6 = {61,62,63,64,65, 66, 67, 68’ 69’ 610’ 611}={/11,,U2,0(211, 0'32, 062112, 031’ 032v 0312, 06231, 032’ 0312}: The parameter approximating cone under the null hypothesis is defined as CQO={6; 61 E R62 E R63 = 0,64 = 0,65 = 0,66 > 0,67 > 0,68 E R69 > 0,610 > 0,611 E IR}. Similarly, the cone under the alternative hypothesis is denoted as CQI={6,61 E R62 E R63 > 0,64 > 0,65 E R66 > 0,67 2 0,68 E R69 2 0,610 2 0,611 E R}. Corresponding to the hypothesis test, the number of tested parameters q is 3 and o is 1, then the set ’l/JVVIY has 23—1 = 4 almost disjoint subsets and all these subsets are classified into 3-1+1=3 groups. (i) ‘6le ={Y;Y1> 0,Y2 > 0, Y3 e R,g(y) 6 VV}; (ii) 11’3le = {Y; Y1 > 0,Y2 g 0,g(y) 6 VV}. 43 ={Y;Y1s 0.1/2 > 03(3) 6 VV}; u-LlY (iii) wjvly = {Y;Y1 s we 3 0,9(y) e W}- The estimator of covariance term is only observed in wzl/VIY’ and it will van- .‘ . . . ‘ 7 . ‘ r ,y _ ,‘2' lsh automatically when Y1 S 0 or Y2 S 0. Moreoyer, uVVIY — i9 ill/v 23“1 . tbl/*IZ in terms of 6* is defined in a similar way, that is, tbl/*IZ = 'Ul wi}*|Z' 2: When Y E wVVIY’ the LR is shown in the form: 154 r 212 + 2% + 2% ~ X3, with mixing prop : Pr(Y E wfiVIY) 22 2 'th ' ' - P- Y 2 1 ~ X1 Wi. mixmg prop. 7( E wuVIY) LR = t .Z2 2 'th ' ' - P Y 3 2 ~ X1 w1 mixmg prop . r( E wVVIY) 0 ~ X2 with mixing prop : Pr(Y E 1494 ) k 0 11le As Y E wlv lY’ LR ~ X3, and the corresponding mixture proportion is calculated as Pr(Y e «pk/W) = Pr(Yl > 0,Y2 > 0,Y3 e R) = Pr(Yl > 0,Y2 > 0) = rr—cos_1 p12 . . . ‘2' ._ ~ 2 7r . When Y 18 in the 2nd category (i.e., Y E ¢VV|Y 1—2,3), LR x1 with mixing probability 2:3:2 Pr(Y E wivly) = %. For Y E wfivly, LR ~ X8: 1 the relevant mixing probability is evaluated by Pr(Y E wfivly) = W12. These three mixing proportions is the same as those in model I. Hence, the probability of LR under model II is in the form: —1 — 1 Pr(LR > c2) = 7r 00; p12P(x§ > c2) + 5P“? > 02) (4.3.11) 7r Model III: In model III, random dominant effects (dk1 , dk2) are considered. The parameters under this model is denoted as: 6={61, 62, 63, 64, 65, 66, 67, 68, 69, 610, 911, 2 2 2 2 2 2 2 2 2 2 2 2 612:613i614}={#’13#2i0a130a2300127adli0d2iad12a0913092909121061906290612}- The approximating cone under the null hypothesis is CQO={6; 61 E R, 62 E R, 63 = 0,64 = 0,65 = 0,66 = 0,67 = 0,68 = 0,69 > 0,610 > 0,611 E R,612 > 0,613 > 0,614 E R}, and the cone under the alternative hypothesis is denoted as CQI={0,916 R,92 E R,63 > 0,64 > 0,65 E R,66 > 0,97 > 0,68 E R,99 > 155 0,610 > 0,611 E R,612 > 0,613 > 0,614 E R}. The number of testing parameters in model III for q is 6 and for o is 2. Then the set wVVIY can be partitioned into 26—2 = 16 almost disjoint subsets that comprise 6-2+1=5 categories. (i) 4211/le ={Y;Y1> 0,Y2 > 0, Y3 e R,Y4 > 0,Y5 > 0, Y6 e R,g(y) e W}; (ii)wl2/V|Y -_- {Y;Y1 > 0,Y2 g 0,Y4 > 0, Y5 > 0,Y6 e R,g(y) e VV}, 423V”, ={Y;Y1S 0,Y2 > 0, Y4 > 0,Y5 > O,Y6 e R,g(y) e W}; 143V“, ={Y;Y1> 0,Y2 > 0,Y3 e R,Y4 > 0,Y5 g 0,g(y) 6 UV}, 43V”, ={Y;Y1> 0,Y2 > 0,Y3 e R,Y4 g 0,Y5 > 0,g(y) 6 VV}; (iiiwgle = {Y;Y1 s 0.1/2 3 0,Y4 > 0,Y5 > 0.Y5 E R,g(v) 6 VV}; 1113le = {Y;Y1 > 0,Y2 > 0,Y3 e R,Y4 g 0,Y5 g 0,g(y) e W}; $2?le = {Y;Y1 S 0.3/2 > 0,Y4 S 0.3/5 > 0,9(31) 6 VVl; 423le = {Y;Y1 s 0.3/2 > 0.1/4 > 0.13 s 0.9(y) 6 VV}; 4211/9“, ={Y;Y1> 0.1/2 s an s 0.1/5 > 0.9(y) 6 VV}; will), = {Y;Y1 > 0.3/2 S 0.1/4 > 0.3/5 S 0,9(31) 6 VV}; 6211/le ={Y;Y1S 0,Y2 g 0,Y4 > 0, Y5 s 0,g(y) 6 VV}; wit/W = {Y;Y1 S 012 > 0,Y4 S 0,Y5 S 0,9(31) 6 VV}; 71);le ={Y;Y1> 0.1/2 S 0,Y4 S 0,Y5 S (MM!) 5 UV}; (vwlf‘tly = {Y;Y1 s 0.3/2 s on s 0.1/5 3 0,9(y) e 12V}; 26—2 26—2 ‘ : i 3 ' : (if ‘ ‘ , , ,g , The set wVVIY 1'91 wi/VIY’ iccordingly, tbl/*IZ z'L—Jl LV*|Z' Based on these almost disjoint subsets, the limiting distribution of LR with the mixing proportion is in the form: 156 0,610 > 0,611 E R,612 > 0,613 > 0,614 E R}. The number of testing parameters in model III for q is 6 and for o is 2. Then the set $11le can be partitioned into 26"2 = 16 almost disjoint subsets that comprise 6—2+1=5 categories. (i) 2411/le ={Y;Y1> O,Y2 > 0,Y3 e R,Y4 > 0,Y5 > O,Y6 e R,g(y) e W}; omfiwy=qYfl1>afigm34>afi>tagenmweVW, (@WY={YJ1gaw>o44>am>oageRmmveg 11):le ={Y;Y1> 0,Y2 > 0,Y3 e R,Y4 > 0, Y5 g 0,g(y) e W}, 143le ={Y;Y1> 0,Y2 > 0,13 6 R,Y4 g 0, Y5 > 0,g(y) 6 VV}; (nip/23W], = {Y;Y1 g 0, Y2 g 0,Y4 > 0,Y5 > 0,Y6 e R,g(y) e W}; ‘63ij ={Y;Y1> 0,Y2 > 0,Y3 e R,Y4 g 0,Y5 g 0,g(y) 6 VV}; lDSVjY ={Y;Y1S 0,Y2 > 0, Y4 g 0,Y5 > 0,g(y) e W}; 103le ={Y;Y1£ 0,142 > 03/4 > 0,3’5 S 0,9(y) 6 VV}; wile/W ={Y;Y1> 0,Y2 g 0,Y4 g O,Y5 > 0,g(y) 6 UV}; 411/1 ={Y;Y1> O,Y2 g 0, Y4 > 0,Y5 g 0,g(y) e W}; (1V)1/)i2 = {Y;Y1 g 0, Y2 g 0,Y4 g 0,Y5 > 0,g(y) 6 UV}; if = {Yo/1 s 0.1/2 3 0.1/4 > 0.1/5 3 0.9(y) e W}; wif/IY = {Y;Y1 S (Li/2 > 0,Y4 S 0,Y5 S 0,9(y) 6 UV}; with, ={Y;Y1> 03/2 s 0.33 5 mg 5 My) 6 VV}; (Wt/Jig”, = {Y;Y1 S 0.3/2 S 0.3/4 S 0.3/5 S 0,9(31) 6 UV}; 25—2 26-2 . = i 2 - a, ___ /,i , . ,., The set #111le i—L—Jl wVV|Y’ iccordingly, ¢V*|Z i-L—Jl Lui‘lZ' Based on these almost disjoint. subsets, the limiting distribution of LR with the mixing proportion is in the form: LR K 2 2 2 2 2 2 2 Zf+Z§+Z§+Z§~X§1 Z§+ZE+Z§+Z§~X3 Zf+Z§+Z§+Zg~X§ Z?+Z§+Z§+Z§~X?1 Z§+Z§+Z3~X§ Zf+Z§+Z§~X§ z§+z§~xg 4+4~4 Zf+Z§~x§ 2M 2 2 ZINXI 2 157 with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : with mixing prop : "1 Therefore, the probability of LR under model III is in the form: Pr(LR > c2) =Pr(Y e wivlywag > c2) 5 + Z Pr( Y e w VIY)P(X4 > c2) i=2 7 i=6 11 + Z Pr( Y E w i/VlY)P(X2 > C?) i=8 15 + ZPrflYEw VVIY)P P(X¥>c2) i=12 All mixing proportions can be calculated based on previous results (Kudo p.415 1963, Shapiro p.141 1985): (1) the mixing proportion corresponding to chi-square random variable with 4 (if is :25 _2 Pr( Y E (b2 VVIY)’ and this probability can be estimated as. 5 ZPr(YEw VIY): Z Pr(Ya0,Yc>0, i=2 a=1,2,4,5 a#b,a;éc,a7£d,a#e Yd > 0,Ye 6 R) = Z Pr(Ya < 0,Yb > 0,Yc > 0,Yd > 0) a=1,2,4,5 a¢ha¢ga#d 1 __ :8_rr(87r — 2 cos lpablc) a>b;a#c,b;éc 158 ‘2‘PacP2 where pablc is calculated from the equation, pab|c= (,a b, 0:1,. and 4=q—o). (2) With respect to the chi-square random variable with 3 (if, the mixing propor- tion is given as 23:6 Pr(Y E wizVIY)’ the calculation of this probability is evaluated as: Z Pr(Y e (if/W) =Pr(Y1 < 0,Y2 < O,Y4 > O,Y5 > 0, Y6 e R) +P7‘(Y1 > 0,Y2 > 0,Y3 E R,Y4 < 0,Y5 < 0) =Pr(Y1 < 0,Y2 < 0,Y4 > 0,Y5 > 0) +P7‘(Y1 > 0,Y2 > 0,Y4 < 0,Y5 < 0) 1 _ _ =Z7T—2[cos 1p12(7r—cos 1p45I12) + cos—1p45(7r — cos—1p12I45)] note that ,0ch ab is estimated from _PacP +P P ‘PacP P —P ,P P pab 2 P d| b: H)“ - C a l-pgd-pgc-pgg+2pgcpgdpgd 1—p2 -p2 -p2 +2p2 p2 p2 \ 1—p cd \ 1_pcd 922211—8 Pr(Y E W VVIY) is the mixing probability for the chi-square component with 2 df. This mixing proportion is evaluated as: 159 11 Z Pr(Y e wfijY) = Z Pr(Ya < 0,Yb > O,Yc < O,Yd > 0) i=8 a=1,2;c=4,5 1 _ _ zmkos 1p14(7r—cos 1p25I14) + cos—1p15(rr — COS—162M151) 1 + cos_ p24(7r — cos—1p15l24) + cos—1p25(rr — cos—1p14l25)] where p cdl ab is defined in the same way. (4) Following the comment provided by Shapiro (1985), the mixing proportions are assigned equally on the even and odd places. Thus the mixing probability 2:21:12 Pr(Y E wizVIY) for chi-square component with 1 df is calculated as: 15 Z Pr(Y e wile) = Pr(Ya > O,Yb < 0,Yc < 0,Yd < 0) 1:12 a=1,2,4,5 5 —l — Pr(Y e 141' ) —2 . l/VIY i=2 1 _ =8—7r( 2 cos lpablc — 47r) a>b;a;éc,b7éc where there is a resemblance about calculation of p ab] c between 21.1212 Pr(Y E i 5 1i wVVIY) and 22-22 Pr(Y E wVVIY). (5) An approximated estimation of mixing proportion Pr(Y E wII/VIY) correspond- 160 ing to the chi-square component with 6 (if is derived according to the above comment. This mixing probability is evaluated as: 11 P r(YE1/)1VIY)+Pr(YEw1VIY) =5— 21% YeviVIY) i=6 Suppose two mixing proportions Pr(Y E inIY) for chi-square component with 6 (if and Pr(Y E ibIV IY) for chi-square component with 0 df equally share the probability 2— 2%16 Pr( Y E w VVIY). Therefore, the mixing proportion Pr(Y E inIY) is approximated as: 111 ‘ZP (yeti/WY) i=6 #IIIH [\DI PT (YET/)1 VVIY)= 1 —- ———2-[cos 1 1012(7r — COS 1045(12) ash-a + cos—149450? — cos—1p12l45) + cos—1p14(7r — cos—1p25ll4) + cos—1p15(7r — cos—1p24I15) -1 + cos p24(7r — cos—1p15l24) + cos—1p25(7r — cos—1p14I25)] 161 4.4 Simulation We designed simulations to evaluate the limiting distribution of the LRT. The results of the new approximation are compared with those from Self and Liang (1987) and Amos (2001). We simulated 40 nuclear families each with 5 sibs. Phenotype data are gener- ated assuming there is no main genetic effect at all under the null. For the uni— variate trait analysis (Model I), data are simulated with the variance of polygene 2 2 effect defined as 09:2, environmental error set as 0e=2-51 and 1000 replicates are recorded. For the bivariate model, data are simulated based on the parameters of 2 2 o o polygene effect and random residual effect given by: ( 291 9212)=(128 128), and 0912 ”92 2 2 ”612 ”62 The performance of the approach is illustrated at several critical values in Table 4.3. It is clear that the type I error rates with the new method are much closer to the corresponding nominal level than those of the other methods. A quantile plot of the results are shown in Figure 4.3. The current approximation method shows the best approximation for the three models. 4.5 Conclusion The new threshold determination method provides better approximation to the dis- tribution of the LRT under the three models evaluated. These three models represent the most widely applied models in genetic linkage analysis. we expect the new method 162 Table 4.3: Comparisons of the performance of different approximation methods based on 1000 simulation replicates under different models. Model Method critical value 0:01 0:005 (120.01 a=0.005 Model I SF 0.063 0.032 0.005 0.002 New 0.093 0.051 0.008 0.005 Model II Amos 0.182 0.104 0.027 0.014 New 0.097 0.059 0.016 0.005 Model III Amos 0.0839 0.0462 0.0158 0.0036 New 0.0912 0.0523 0.0158 0.0049 (SF indicates the approximation is done with the result in Self and Liang (1987), i.e., LR ~ 4X2 : %X% : 3x8; New refers to the approximation by the current method; Amos refers to the approximation given in Amos (2001), i.e., LR ~ 4X3 : %X% : 21: x3 for Model II, and LR ~ fax?) : 416,3, : 12606 ; fisxi : fix? : 11633 for Model 111.) can reduce false positives in determining a linkage signal, hence reduce the cost of un- necessary investigations in a lab condition due to false results. This work represents the most comprehensive evaluation of the LRT in linkage analysis with the variance components model. 163 Model I: Univariate Model with additive and dominance effect 0.30 l \ Self&Liang:’-.-.-’ , ’ Proposedz’ 0.25 Empirical 0.15 0.20 0.10 0.05 0.00 l l T l l T l 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Nominal_level Figure 4.1: The quantile plot of the empirical p—values for Model I. For the legend: Self & Liang refers to SF; Proposed refers to the current method. See Table 4.3 for more explanation of the legend. 164 Model II: Bivariate Model with additive effect 8. - IV 0 Self & Liang:-.-.- ,° I . Amos: ---- - ’ o 53. - Proposed: _/ . ° , I ’ O . ° :39- so iii to Q— o o q- or 0.00 0.05 0.10 0.15 Nominal_level Figure 4.2: The quantile plot of the empirical p—values for Model II. For the legend: Self & Liang refers to SF; Proposed refers to the current method. See Table 4.3 for more explanation of the legend. 165 Model Ill: Bivariate Model with additive and dominance effect 8 . o‘ . ’ Self&Liang:'-.-.-’ ’ - I I I o v :2 _I Amos: ---- . , ’ . O . ’ I Proposedz’ ’ , , ’ :g S g 5 m to q .. O O 0 d l r F l 0.00 0.05 0.10 0.15 Nominal_level Figure 4.3: The quantile plot of the empirical p-values for Model III. For the legend: Self & Liang refers to SF; Proposed refers to the current method. See Table 4.3 for more explanation of the legend. 166 Chapter 5 Concluding remarks Genomic imprinting, a unique phenomenon in multicellular organisms, is carried out in a regulated way that generally confers advantages during an organism’s life cycle. Its role in controlling embryonic development and growth is not only restricted in humans and animals, but also in flowering plants. The information about how genes controlling or affecting this process is crucial for unravelling the genetic basis of many quantitative traits, which can not be explained by the traditional Mendelian inheritance theory. The identification of imprinted genes has been one of the most important and difficult tasks for genomic imprinting study. While many scientists are trying to experimentally unravel the molecular mechanism of genomic imprinting, identifying imprinting genes with statistical QTL mapping techniques is still in its infancy and therefore is in much demanding. With the abundant molecular marker information, it is now possible to detect potential imprinted genes underlying the quantitative variation of an imprinting trait. We, for the first time, developed a 167 series of statistical models and algorithms for detecting and characterizing specific iQTLs that are responsible for genomic imprinting under various problem settings. The developed models can make a systematic scan of iQTLs across the entire genome with a well-covered genetic linkage map. Focusing on flowering plants, in this dissertation, I developed a series of statistical methods based on the variance components model in linkage analysis. Specifically, in chapter 2, I developed an efficient mapping approach focusing on a diploid mapping population (e.g., embryo in plants). We focused our genetic design on a reciprocal backcross design with experimental crosses. Different line crosses were combined to infer the random allelic effects under the variance components model. We partitioned the additive genetic effect into different components based on the nature of the allelic- sharing mechanism in experimental crosses. In chapter 3, we extended the idea to a triploid endosperm mapping population. The unique triploid structure in an en- dosperm tissue was considered. The utility of the method was demonstrated with a real data set. Important iQTLs were identified to control the endosperm develop- ment. Genomic imprinting can be explained by the genetic conflict theory proposed by Haig and Westoby (1991). Our real data analysis results are in consistent with and supported by this theory. In both chapters, we extended the single iQT L model to consider multiple iQTLs (i.e., multiple iQTL model). The multiple iQTL model can efficiently handle the problem due to the interfering of linked iQTLs on the same linkage group. Moreover, it also shows increased mapping precision as shown in the simulations studies. 168 "1. When strong genetic correlations among multivariate traits occur in the QTL mapping, the multivariate analysis can largely improve the statistical power and ac- curate position of the genetic effect (Boomsma and Dolan 1998; Jiang and Zeng, 1995; Amos et al. 2001; Evans 2002). This motivates us to develop a multivariate iQTL mapping model, which is studied in chapter 3. Extensive simulation studies show the relative merit of multivariate analysis, especially when traits are correlated. In a multivariate linkage analysis, we also gain additional benefit by statistically quan- tifying pleiotropic vs close linkage effect. The real data analysis indicates that two QTLS express strong pleiotropic effect to control the two endosperm traits used in this study. In a variance components-based linkage analysis, the likelihood ratio test has been the standard means in assessing the statistical significance of a linkage signal. How- ever, due to irregular conditions (e. g., the restriction of the variance component terms under the hypothesis), the regular asymptotic chi-square distribution theory does not apply directly. In chapter 4, we conducted a statistical investigation of the LRT, and found that the currently applied cutoff determination method is inappropriate. This finding is in consistent with an empirical study (Allison et al., 1999). We evaluated the limiting distribution of the LRT under three model settings which are the mostly used models in a linkage analysis. Simulation study shows the superiority of the new approximation method over the currently applied ones. Other statistical issues such as deriving the optimization algorithms for parameter estimation, and proof of the theorems have been given. Coupling with the emergence 169 of abundant marker information, large collection of well-phenotyped samples and high-throughput genotyping, our models provide a quantitative testable framework to assess genome—wide significance of imprinted genes. The developed models also pro- vide a testable platform for scientists who can design their experiment accordingly and significant discoveries would be expected in the future. This dissertation con- tributes to the statistical methodology development in QTL mapping, to the general statistical theory in variance components model, and to the general genetic mapping community by providing statistically sound approaches and tested programs. 170 BIBLIOGRAPHY Abney, M., Sara McPeek, M. and Ober, C. (2000). Estimation of variance components of quantitative traits in inbred populations. Am. J. Hum. Genet. 66(2): 629-650. Allison, D.B., Neale, M.C., Zannolli, R., Schork, N.J., Amos, CI. and Blangero, J. (1999). Testing the robustness of the likelihood-ratio test in a variance-component quantitative—trait loci-mapping procedure. Am. J. Hum. Genet. 65, 531—544. Almasy, L and Blangero, J . (1998). Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62, 1198-1211. Almasy, L., Dyer, T. D., and Blangero, J. (1997). Bivariate Quantitative Trait Linkage Analysis: Pleiotropy Versus Co—incident Linkages. Genet Epidemiol. 14(6):953—958. Amos, CL (1994). Robust variance-components approach for assessing genetic link- age in pedigrees. Am. J. Hum. Genet. 54, 535-543. Amos, C. and Andrade, M. (2001). Genetic linkage methods for quantitative traits. Stat. Methods. Med. Res. 10: 325. Amos, C.I. de Andrade, M and Zhu, K.D. (2001). Comparison of Multivariate Tests for Genetic Linkage. Hum. Hered. 51: 133-144. Bartholomem, D.J. (1959a). A test of homogeneity for ordered alternatives. Biometrika. 46, 35-48. Bartholomem, D.J. (1959b). A test of homogeneity for ordered alternatives II. Biometrika. 46, 328-35. Bartholomem, D.J. (1961a). A test of homogeneity of means under restricted alter- natives. J.R.Statist.Soc. B,23,239-81. 171 Bohrer, R. and Chow, W. (1978). Weights for one-sided multivariate inference. Appl. Statist. 27, 100—104. Bomblies, K. and Doebley, J .F. (2006). Pleiotropic effects of the Duplicate Maize FLORICAULA/LEAFY Genes zfll and zfl2 on Traits Under Selection During Maize Domestication. Genetics. 172: 519531. Boomsma, DI. and Dolan, CV. (1998). A comparison of power to detect a QTL in sib-pair data using multivariate phenotypes, mean phenotypes, and factor scores. Behav Genet 28: 329-340. Brink, R. A. and Cooper, D. C. (1947). The endosperm in seed development. Bot. Rev. 13, 423-541. Burt, A. and Trivers, R.L. (2006). Genes in Conflict, Harvard University Press, Cambridge, MA, USA. Chant. D. (1974). On Asymptotic Tests of Composite Hypotheses in Nonstandard Conditions. Biometrika. 61, 291-298. Chaudhury, A.M., Koltunow, A., Payne, T., Luo, M., Tucker, M.R., Dennis, ES. and Peacock, W.J. (2001). Control of early seed development. Ann. Rew. Cell Dev. Biol. 17: 677-699. Chaudhuri, S. and Messing, J. (1994). Allele-specific parental imprinting of dzrl, a post transcriptional regulator of zein accumulation. Proc. Natl. Acad. Sci. 91: 4867-4871. Chernoff, H. (1954). On the distribution of the likelihood ratio. Ann. Math. Stat. 25: 573-578. Churchill, GA. and Doerge, R.W. (1994). Empirical threshold values for quantitative trait mapping. Genetics. 138: 963-971. Clark, R.M, Wagler, T.N., Quijada, P. and Doebley, J. (2006). A distant upstream enhancer at the maize domestication gene tbl has pleiotropic effects on plant and inflorescent architecture. Nat. Genet. doi:10.1038/ng1784. Cockerham, CC. (1983) Covariances of relatives from self-fertilization. Crop. Sci. 23: 11771180. Corbeil, RR. and Searle, SR. (1976). A comparison of variance component estima- tors Biometrics. 32: 779-791. 172 Cui, Y.H., Cheverud, J .M. and Wu, R.L. (2007). A statistical model for dissecting genomic imprinting through genetic mapping, Genetica. 130, 227-239. Cui, Y.H. (2007). A statistical framework for genome-wide scanning and testing imprinted quantitative trait loci. J. Theo. Biol. 244: 115-126 . Cui, Y.H., Lu, Q., Cheverud, J.M., Littel, R.L. and Wu, R.L. (2006). Model for mapping imprinted quantitative trait loci in an inbred F2 design. Genomics. 87: 543-551. Cui, Y.H. and Wu, R.L. (2005). A statistical model for characterizing epistatic control of triploid endosperm triggered by maternal and offspring QTL. Genet. Res. 86: 65-76. David,.F.N. (1953). A note on the evaluation of the multivariate normal integral. Biometrika. 40, 458-9. DeChiara,T.M., Robertson, E.J. and Efstratiadis, A. (1991). Parental imprinting of the mouse insulin-like growth factor II gene. Cell. 64, 849-859. Dilkes, B.P., Dante, R.A., Coelho, C. and Larkins, BA. (2002). Genetic analysis of endoreduplication in Zea mays endosperm: evidence of sporophytic and zygotic maternal control. Genetics. 160: 1163-1177. Dupuis, J. and Siegmund, D. (1999). Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics. 151, 373-386. Evans, D.M. (2002). The power of multivariate quantitative-trait loci linkage analysis is influenced by the correlation between the variables. Am. J. Hum. Genet. 70: 1599-1602. Feil, R and Berger, F. (2007). Convergent evolution of genomic imprinting in plants and mammals. Trends. Genet. 23, 192-199. Grime, JP. and Mowforth, M.A. (1982). Variation in genome size: an ecological interpretation. Nature. 299: 151-153. Grossniklaus, U., Spillane, C., Page, DR, and Koehler, C. (2001). Genomic imprint- ing and seed development: Endosperm formation with and without sex. Curr. Opin. Plant Biol. 4: 2127. Hanson, R.L., Kobes, S., Lindsay, RS. and Kmowler, WC. (2001). Assessment of parent-of-origin effects in linkage analysis of quantitative traits. Am. J. Hum. Genet. 68(4): 951-962. 173 it Haig, D. and Westoby, M. (1991). Genomic Imprinting in endosperm: Its effect on seed development in crosses between species, and between different ploidies of the same species, and its implications for the evolution of apomixis. Philos. Trans. R. Soc. Land. 333: 1-13. Harris, D.L. (1964). Genotypic covariances between inbred relatives. Genetics. 50: 1319-1348. Jansen, RC. (1993). Interval mapping of multiple quantitative trait loci. Genetics. 135, 205—211. Jansen, RC. (1994). Controlling the Type I and Type 11 errors in mapping quanti- tative trait loci. Genetics. 138, 871-881. Jiang, C. and Zeng, Z-B. (1995). Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 140: 1111-1127. Kao, C.H., Zeng, Z-B., and Teasdale, RD. (1999). Multiple interval mapping for quantitative trait loci. Genetics. 152, 1203-1216. Kendall, MG. (1941). Proof of Relations connected with the Tetrachoric Series and its Generalization. Biometrika. 32, 196-8. Kermicle, J .L. (1970). Dependence of the R-mottled aleurone phenotype in maize on the modes of sexual transmission. Genetics. 66: 69-85. Kinoshita, K., Yadegari, M., Harada, J .J ., Goldberg, RB. and Fishcher, R.L. (1999). Imprinting of the MEDEA polycomb gene in the Arabidopsis endosperm. Pla. Cell. 11: 1945-1952. Knott, S.A., Marklund, L., Haley, C.S., Andersson, K., Davies, W., Ellegren, H., Fredholm, M., Hansson, I., Hoyheim, B., Lundstrm, K., Moller, M. and Andersson, L. (1998). Multiple marker mapping of quantitative trait loci in a cross between outbred wild boar and large white pigs. Genetics. 149, 1069-1080. de Koning, D-J., Rattink, A.P., Harlizius, B., van Arendonk, J .A.M., Brascamp, R.W. et al. (2000). Genome-wide scan for body composition in pigs reveals important role of imprinting. Proc. Natl. Acad. Sci. USA 97: 7947-7950. de Koning, D-J., Bovenhuis, H. and van Arendonk, J .A.M. (2002). On the detection of imprinted quantitative trait loci in experimental crosses of outbred species. Genetics. 161(2): 931—938. Kudo, A. (1963). A multivariate analogue of the one-sided test. Biometrika. 50, 174 403-18. Kudo, A and Choi, J .R. (1975). A generalized multivariate analogue of the one sided test. Mem.Fac.Sci.,Kyushu Univ. 29, 303-28. Lander, ES. and Botstein, D. (1989). Mapping Mendelian factors underlying quan- titative traits using RFLP linkage maps. Genetics. 121, 185-199. Li, Y.C, Coelho, C.M, Wu, S, Zeng, Y.R, Li, Y, Hunter, B, Dante, R.A, Larkins, BA and Wu, R. (2008). A statistical model for estimating maternal-zygotic interac- tions and parent-of-origin effects of QTLs for seed development. PLoS ONE. 3, e3131. Li, G.X. and Cui, Y.H. (2009). A statistical variance components framework for mapping imprinted quantitative trait loci in experimental crosses. J. Prob. Stat. Article ID 689489, doi:10.1155/2009/ 689489. Li, GK and Cui, Y.H. (2010). A general statistical framwork for dissecting parent- of-origin effects underlying endosperm traits in flowering plants. Ann. App. Stat. Lin, M., Lou, X.Y., Chang, M. and Wu, R. (2003). A general statistical framework for mapping quantitative trait loci in nonmodel systems: issue for characterizing linkage phases. Genetics. 165, 901-913. Liu, T., Todhunter, R.J., Wu, 8., Hon, W., Mateescu, R., Zhang, Z., Burton-Wurster, N.I., Acland, G.M., Lust,G. and Wu,R . (2007). A random model for mapping im- printed quantitative trait loci in a structured pedigree: an implication for mapping canine hip dysplasia. Genomics. 90, 276-284. Lund, G., Messing, J. and Viotti, A. (1995). Endosperm-specific demethylation and activation of specific alleles of alpha—tubulin genes of Zea mays L. Mol. Gen. Genet. 246: 716-722. Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA, USA. Malécot, G. (1948) Les mathe’matiques del’he’rédite’. Masson et Cie, Paris, France. Martinez, O. and Curnow, RN. (1992). Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers. Theor Appl Genet. 85, 480-488. Moran, P.A.P. ( 1971). The uniform consistency of maximum-likelihood estimators. Pmc. Cambridge Philos.Soc. 70, 435—439. 175 l Moran, P.A.P. (1971). Maximum Likelihood Estimators in Non-Standard Conditions. Proc. Comb. Phil. Soc. 70, 441-450. Moran, P.A.P. (1948). Rank correlation and product-moment correlation. Biometrika. 35, 203-6. Moose, SP. and Sisco, RH. (1996). Glossy15, an APETALA2-like gene from maize that regulates leaf epidermal cell identity. Genes. Develop. 10:3018-3027. N uesch, P. E. (1966). On the problem of testing location in multivariate populations for restricted alternatives. Ann.Math.Statist. 37, 113-9. Patterson, H.D. and Thompson, R. ( 1971). Recovery of interblock information when block sizes are unequal. Biometrika. 58, 545-554. Pfeifer, K. (2000). Mechanisms of genomic imprinting. Am. J. Hum. Genet. 67: 777-787. Plackett,R.L. (1954). A reduction formula for normal multivariate integrals. Biometrika. 41, 351—60. Self, SC. and Liang, KY. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc. 82, 605-610. Shapiro, A. (1985a). Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika. 72, 133—144. Shapiro, A. (1988). Towards a Unified Theory of Inequality Constrained Testing in Multivariate Analysis. Internat. Statist. Rev. 56, 49-62. Shete, 8., Zhou, X. and Amos, CL (2003). Genomic imprinting and linkage test for quantitative trait loci in extended pedigrees. Am. J. Hum. Genet. 73: 933-938. Sturaro, M., Hartings, H., Schmelzer, E., Velasco, R., Salamini, F. and Motto, M. (2005). Cloning and Characterization of GLOSSYl, a Maize Gene Involved in Cuticle Membrane and Wax Production. Plant Physiol. 138, 478489. Vega, S. H., Sauer, M., Orkwiszewski, J.A.J. and Poeth, RS. (2002). The early phase change Gene in Maize. Plant Cell. 14, 133147. Xie, C, Gessler, D.D.G and Xu, S. (1998). Combining different line crosses for map- ping quantitative trait loci using the identical by descent-based variance compo- nent method. Genetics. 149, 1139-1146. 176 l —mww Xu, S. and Atchley, W.R. (1995). A random model approach to interval mapping of quantitative trait loci, Genetics. 141, 1189-1197. Wilks, SS. (1938). The large distribution of the likelihood ratio for testing composite hypotheses. Ann.Math.Stat. 9, 60. Williams, J .T., Eerdewegh, P.V. Almasy, L. and Blangero, J. (1999). Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. likelihood formulation and simulation results. Am. J. Hum. Genet. 65, 1134-1147. Wolf, J., Cheverud, J., Roseman, C. and Hager, R. (2008). Genome-wide analysis reveals a complex pattern of genomic imprinting in mice. PLoS Genetics. 4, doi: 10. 1371/journal.pgen. 1000091. Wu, R.L, Casella, G, Ma, C-X. (2007). Statistical Genetics of Quantitative Traits: Linkage, Maps and Qtl. Springer, New York, USA. Zeng, Z-B. (1994). Precision mapping of quantitative trait loci. Genetics. 136, 1457-1468. Zeng, Z—B. (1993). Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc. Nat. Acad. Sci. USA, 90, 10972-10976. 177 llllllll 63 7551 lmuluulllltmlw