SEMIPARAMETRICMODELSFORMOUTH-LEVELINDICESINCARIESRESEARCHByYifanYangATHESISSubmittedtoMichiganStateUniversityinpartialentoftherequirementsforthedegreeofBiostatistics-MasterofScience2016ABSTRACTSEMIPARAMETRICMODELSFORMOUTH-LEVELINDICESINCARIESRESEARCHByYifanYangFornonnegativecountresponsesinhealthservicesresearch,alargeproportionofzerocountsarefrequentlyencountered.Forsuchdata,thefrequencyofzerocountsistypicallylargerthanitsexpectedcounterpartundertheclassicalparametricmodels,suchasPoissonornegativebinomialmodel.Inthisthesis,asemiparametricregressionmodelisproposedforcountdatathatdirectlyrelatescovariatestothemarginalmeanresponserepresentingthedesiredtargetofinference.Themodelspassumestwosemipara-metricforms:thelog-linearformforthemarginalmeanandthelogistic-linearformforthesusceptibleprobability,inwhichthefullylinearmodelsarereplacedwithpartiallylinearlinkfunctions.Aspline-basedestimationisproposedforthenonparametriccomponentsofthemodel.Asymptoticpropertiesarediscussedfortheestimatorsoftheparametricandnonparametriccomponentsofthemodels.Sp,theestimatorsareshowntobestrongconsistentandasymptoticallytundermildregularityconditions.Abootstraphypothesistestisperformedtoevaluateinvolvingthenonparametriccomponent.Simulationstudiesareconductedtoevaluatetheitesampleperformanceofthemodel.Finally,themodelisappliedtodentalcariesindicesinlowincomeAfrican-Americanchil-drentoevaluatethenonlinearofsugarintakeoncariesdevelopment.Theconclusionshowsthattheofsugarintakeoncariesindicesisnonlinear,especiallyamongyoungchildrenundertheageof2.Andchildrenwhosecaregiversareunemployedandhavepoororalhealthyexhibithigherdentalcariesrates.ACKNOWLEDGMENTSFirstandforemost,IreallyappreciateDr.DavidTodem'stutelage,encouragementandhelpthroughoutthecourseofthisresearchwork.HeisavailableforacademicadvicewheneverIneedandgentlyguidesmeforagoodunderstandingofbiostatistics.IamextremelygratefultoDr.Todemforhisunwaveringsupport,aswellashispatienceandcarefulplanningforgraduatestudyatMichiganStateUniversity.IalsowouldliketotakethisopportunitytothankmythesiscommitteemembersDr.QingLuandDr.ChenxiLiwhohaveaccommodatedeveryrequestdespitetheirfullschedulesandprovidemewithpreciousfeedback.IwouldliketothankDr.JosephGardiner,Dr.HaoWang,Dr.NigelPaneth,Dr.DavidBarondess,Dr.ZhehuiLuoandDr.WenjiangFufortheirhelpandadviceduringmygraduatestudiesintheDepartmentofEpidemiologyandBiostatistics.Lastbutnotleast,Iwouldliketoexpressmydeepestgratitudetomybelovedparentsfortheirloveandsupportovertheyears.IalsohavethepleasureoflivinginEastLansingwithmanygoodfriends.iiiTABLEOFCONTENTSLISTOFTABLES....................................vLISTOFFIGURES...................................viKEYTOSYMBOLS..................................viiChapter1Introduction...............................11.1OralHealthResearch...............................11.2Models...............................21.3MethodofSieves.................................21.4OrganizationofThisThesis...........................3Chapter2ModelsandSieveEstimator.....................52.1ZINBDistribution................................52.2SemiparametricZINBMarginalMeanRegression...............62.3MaximumLikelihoodEstimationbyMethodofSieves.............8Chapter3AsymptoticPropertiesandBootstrapHypothesisTest....153.1AsymptoticPropertiesofSieveEstimator....................153.2BootstrapHypothesisTest............................20Chapter4NumericalResults...........................224.1SimulationStudies................................224.2RealData:Mouth-LevelIndicesinCariesResearch..............24Chapter5Discussion.................................32APPENDIX........................................34REFERENCES.................................48ivLISTOFTABLESTable4.1:EstimatesofdimensionalparametersinworkingModel1...24Table4.2:EstimatesofdimensionalparametersinworkingModel1..24Table4.3:Theestimatesandstandarddeviationsofdimensionalparameters27Table4.4:Theresultofbootstrapsampling....................30vLISTOFFIGURESFigure4.1:Theestimatesofnonparametriccomponentsinsimulationstudy..25Figure4.2:ThehistogramofDMFSindicesandttedNBmodelandPoissonmodel...................................26Figure4.3:ThenonparametricofDASIunderg0(S)andg1(S)inrealdatastudy................................27Figure4.4:Thehistogramofbothestimatedstatisticsbybootstrapmethod..31viKEYTOSYMBOLSAllEnglishandGreekletter-basedsymbolsusedformathematicalexpressionsinthisthesisarelistedhereinalphabeticalorder.Ingeneral,acapitalletterreferstoarandomvariable/vector,andthesmallletterreferstothevalue/samplecorrespondingtotherandomvariable/vector.GeneralRule1:Englishletter-basedsymbolsrefertoknowndataorcovariateslikeX,YandZunlessotherwisestated;GeneralRule2:Greekletter-basedsymbolsrefertounknownparameterslike,andˇunlessotherwisestated;GeneralRule3:RalphSmith'ssymbolsrefertospclassesoffunctionslikeCandF;GeneralRule4:BlackboardboldsymbolsrefertospecialoperationslikethedensityfunctionP,expectationEandvarianceV,orspecialnumbersetslikethesetofrealnumbersRandthesetofnonnegativenaturalnumbersN;GeneralRule5:astarsymbolreferstoanotherpointcorrespondingtotheoriginalonewhichbelongstothesamesetlikebothandbelongtoE:theexpectationofrandomvariables/vectorsorfunctionsofrandomvariables/vectorswithrespecttospP;k0andk1:thegrowthratesofnumberofknots(withoutcontainingtwoends);l:thedegreeofsplinebasisfunctions;l():thelog-likelihoodfunction;L:thelikelihoodfunction;L2andL1:theL2normandtheL1normrespectively;m0andm1:thenumbersofknots(withoutcontainingtwoends);n,n0andn1:thesamplesizeofdata,thesamplesizeofdatawhen=0andthesamplesizeofdatawhen=1;N0andN1:thenumberofparametersforsplinebasisfunctions;PX:theprobabilitymeasureswithrespecttorandomvectorX;r0andr1:thesmoothingparameterscorrespondingtoB1,B2,B3andB4;siandti:theknotsontheinterval[0;1];n:amappingthatrelatesapointonparameterspacetoproductspacen;:thegammafunction.viiChapter1Introduction1.1OralHealthResearchToothdecay,moreacademicallyknownasdentalcariesorcavities,isconsideredasoneofthemostprevalentoraldiseases,inparticularamongyoungchildrenasyoungase(Todem,2012a).Italsoremainsthemostcommonchronicdiseaseamongchildrenagedetoelevenandadolescentsagedtwelvetoseventeenyears(Dyeetal.,2007).Althoughtoothdecaydoesnottlydecreaseschoolorworkperformanceandsocialrelationshipsamongolderadultswhileitdoesamongchildren,itdramaticallyimpactsontheirchewingabilities,forcesthemtolimitdietselection,andeventuallycontributestootheroverallhealthproblems(Sullivanetal.,1993;Blaumetal.,1995;Ritchieetal.,2000).Toevaluatetheseverityofdentalcariesattoothsurfacelevel,manydentalepidemiol-ogistsconductthestudiesusingthedecayed,missingand(DMF)indices(KleinandPalmer,1938),alsocalledasDMFTindiceswhenappliedtoalltheteeth,orasDMFSindexwhenappliedonlytotoothsurfaceseperposteriortoothandfourperanteriortooth).Theintegerscorespersubjectrangefrom0to28/32inDMFTsystemwhiledofrom0to128/148inDMFSsystem(CappelliandMobley,2007).11.2ModelsInvestigatorsfrequentlyencounteredintegercountsdatawithahighfrequencyofzeroval-uesinstudieslikeadentalcariesstudyrelatedtoDMFSindices.models,whichviewdataasbeinggeneratedfromamixtureofapointmassatzeroandanon-degeneratedistribution,havebecomeapopularandinterestingtoolwithintheparametricframework(Mullahy,1986;FarewellandSprott,1988;Lambert,1992;Ridoutetal.,2001;Gilthorpeetal.,2009;Wangetal.,2015)toanalyzecountdatawithexcessivezeros.Thenon-degeneratedistributioncanbethePoissonmodel(Lambert,1992;Lametal.,2006;Heetal.,2010),thenegativebinomialmodel(Yauetal.,2003;Minamietal.,2007;Wangetal.,2015),orotherdiscreteprobabilitydistributionsliketheConway-Maxwell-Poissondistribution(Shmuelietal.,2005)andsoon(Loeysetal.,2012).Thesemodelshavebeenappliedinmanysuchasthestudyoflengthofhospitalstay(Atienzaetal.,2008;SinghandLadusingh,2010),thehealthcareoutcomesresearch(Huretal.,2002),andthestudyofpediatriclengthofstay(Leeetal.,2005).Webelievethenegativebinomial(ZINB)modelismoreappropriatetothedentalcariesdatabecausethevarianceisassumedtobethesameasthemeanforthePoissondistribution,whichmaybeviolatedforrealdataanalysis.TheprocedurefortestingthePoisson(ZIP)modelagainsttheZINBmodelisdiscussedindetailsaswell(Ridoutetal.,2001).1.3MethodofSievesWeconsiderthemodelwiththesemiparametricframework(Xueetal.,2004;Lametal.,2006;Heetal.,2010;Zhangetal.,2010),orcalledapartiallinearmodel,whichis2particularlyappropriatetothedatawhenthecovariateisnonlinearlyrelatedtotheresponse.Forexample,Iaminterestedinevaluatingthenonlinearofthedailyamountofsugarintake(DASI)oncariesindicesinprimarydentitionadjustingforimportantconfounders.Itisextremelytrickytohandlethenonparametriccomponentinthesemiparametricsta-tisticalstudies.Toapproximatethenonparametriccomponent,manystatisticaltoolsareavailable,suchaspiecewisepolynomials(Chen,1988),kernelestimator(Speckman,1988),MestimatorardleandLiang,2007),estimator(Ghosh,2001),andsieveestima-tor(GemanandHwang,1982;ShenandWong,1994;Shen,1997;HuangandRossini,1997).Forrealdataanalysiswithsieveestimator,theunknownnonparametriccomponentcanbeapproximatedbythepiecewiselinearfunctions(Xueetal.,2004;Lametal.,2006;Heetal.,2010),thetriangleseries(SongandXue,2000),thesmallwavesfunctions(ShenandShi,2004),andtheB-splinebasisfunctions(Zhangetal.,2010).TheprimarythoughtisIper-formtheminimizationormaximizationwithinasubsetofanparameterspace,thenletthedimensionofthissubsetgrowwiththesamplesize.MoredetailsaboutmethodofsievewillbegivenintheSection2.3.1.4OrganizationofThisThesisIntherestofthisthesis,IproposeasemiparametricZINBmodelforthreegoals:(1)evaluatingthectofcovariatesonthemarginalmeanresponse,(2)investigatingthenonlinearoftheDASIoncariesindicesinprimarydentitionadjustingforimportantconfounders,and(3)comparingwhethertheabovenonlinearmayvaryintagesgroups.WeproposethespsemiparametricZINBmodelandsievemaximumlikelihood(ML)estimatorforbothparametricandnonparametriccomponentsinChapter32.Givennecessaryassumptions,Iderivetheasymptoticpropertiesofsieveestimatorlikestrongconsistency,rateofconvergence,andasymptoticnormalityinChapter3.Allproofsoftheoremsandadditionallemmasarepresentintheappendix.Furthermore,IconductthebootstraphypothesistestingfornonparametriccomponentsofinterestattheendofChapter3.IntheChapter4,IapplythesemiparametricZINBmodeltorealdentalcariesdataafterevaluatingitsadvantageinsimulationstudy.Wesummarizetheworkinthisthesisanddiscussitsextensiontoothersettingslikepenalizedsievemethods.4Chapter2ModelsandSieveEstimator2.1ZINBDistributionLetYdenotethecountvariablethathasazeronegativebinomial(ZINB)distribu-tion.Sp,assumeYhaveaprobabilitydensityfunctionP(Y=y)=8>>>><>>>>:ˇ+(1ˇ)(1+)1;y=0(1ˇ)y+1()y1y+1)(1+)y+1;y>0(2.1)where0isthedispersionparameterthatisassumednottodependoncovariates(Ridoutetal.,2001;Wangetal.,2015),0isthemeanoftheunderlyingnegativebinomialdistribution(Wangetal.,2015),and0ˇ1istheprobabilityofzerocounts.ThisdistributionreducestothePoisson(ZIP)distributioninthelimit!0(Ridoutetal.,2001;Minamietal.,2007).ThemeanandvarianceofZINBdistributionin(2.1)aregivenbyE(Y)=(1ˇ)(2.2)V(Y)=(1ˇ)(1++ˇ)5whereEandVaretheexpectationoperatorandvarianceoperator,respectively.SomescholarspresentanalternativeparameterizationofZINBmodelaswell(Yauetal.,2003).Thedispersionparametercanbereplacedbyitsreciprocalcalledoverdispersionpa-rameter1inthemodel(Yauetal.,2003;Minamietal.,2007;Wangetal.,2015).Comparedtoanonnegative,therepresentationofoverdispersionparameter1requires6=0,whichcausesbelonginganopenset(0;1).Similarly,thelogarithmicfunctionlog()withre-spectandthelogisticfunctionlogit(ˇ)=logˇ1ˇinthenextsectionrequire6=0andˇ6=1,ˇ6=0.Withoutlossofgenerality,thedomainofinoverdispersioncaseisrestrictedonasubsetoftheopensetlike2[1)andsodoandˇlike2[1)andˇ2[1]foranygiven>0,respectively.Themeanofnegativebinomialdistributionin(2.1),canbewrittenintermsofmeanofZINBdistribution(2.2)representation,thatis,=EYjI(Y>0)=1wheretheidentityvariableI(Y>0)takesvalue1ifY>0andtakesvalue0otherwise.Thus,IcallE(Y)in(2.2)asthemarginalmeanofZINBdistributioncorrespondingtotheconditionalmean=EYjI(Y>0)=1.2.2SemiparametricZINBMarginalMeanRegressionIntheregressionsetting,andˇarecalledaslatentvariables,andbothlogit(ˇ)andlog()areassumedtodependonalinearfunctionofcovariatesforalmostallcases(Lambert,1992;Ridoutetal.,2001;Yauetal.,2003;Minamietal.,2007;Wangetal.,2015).Althoughthislatentvariablesformulationinsomesettingsprovidesaversatileandusefulrepresentationofthedata,theimpliedregressionparameterizationmayfailtoprovideaclearanswertothequestionofevaluatingthecovariateonthemarginalmeanresponse.Therefore,6inthisthesis,thelatentvariableˇandthemarginalmeanE(Y)areassumedtodependonalinearfunctionofcovariates,givenbylog(E(Y))=>X(2.3)logit(ˇ)=>Z(2.4)whereX=1;X1;:::;Xd1>andZ=1;Z1;:::;Zd2>are(d1+1)1and(d2+1)1covariatesvectors,andarevectorsofunknownregressioncots,respectively,logit(ˇ)=logˇ1ˇandthesymbol>isthetransposeofavectorormatrix.Thecovariatesthatctthemarginalmeanofoutcomemayormaynotbethesameasthecovariatesthattheprobabilityofzerocounts.LamandXue(2006)consideredasemiparametriclinkfunctionfortheirZIPmodel(Lametal.,2006),thenHeandXue(2010)extendedtheZIPmodeltothedoublysemiparametricZIPmodel(Heetal.,2010).Motivatedbythese,IextendtheparametricZINBmodeltoasemiparametriconewithpartiallylinearlinkfunctionsforboththemarginalmeanE(Y)andthelogitoftheprobabilityofzeroslogit(ˇ),expressedasthefollowingjointmodellog(E(Y))=log[(1ˇ)]=>X+(1g0(S)+g1(S)(2.5)logit(ˇ)=>Z+(1h0(S)+h1(S)(2.6)whereY2AYN,X=X1;:::;Xd1>2AXRd1andZ=Z1;:::;Zd2>2AZRd2ared11andd21covariatesvectorswithoutintercepts,andarevectorsofunknownregressioncotsaswell,g0(S);g1(S);h0(S)andh1(S)areunknownsmoothingfunctionswithrespecttocontinuouscovariateS2[0;1]thatoccursbetween07and1forbinaryvariable2f0;1gthattakesvalue0or1,latentvariablessatisfythat2A[0;1)andˇ2Aˇ[0;1].Wecallthemodel(2.1),(2.5)and(2.6)assemiparametricZINBmarginalmeanmodel.Forrealdataanalysis,standsfortgroupsthatsubjectsbelongto.REMARK2.2.1.NoticethecovariatesvectorsXin(2.3)and(2.5)havetdimen-sions,sodoZin(2.4)and(2.6).WeputtheintercepttermsX0=1andZ0=1intothenonparametriccomponentslikeotherdo(Xueetal.,2004;Lametal.,2006;Heetal.,2010)2.3MaximumLikelihoodEstimationbyMethodofSievesLetW=Y;X>;Z>;;S>2bethedatavector,wherethesamplespaceisgivenby,ˆW:W2AYAXAZf0;1g[0;1]˙=AYAXAZf0;1g[0;1]NRd1Rd2f0;1g[0;1]Let=>;>;g0;g1;h0;h1>bethevectorofalltheunknownquantitiesofinterestwithT=>T;>T;T;gT;0;gT;1;hT;0;hT;1>astheuniquetruevalueof.Assumetheparameterspacebegivenby,ˆ:2A1;2A2;2A3;g02B1;g12B2;h02B3;h12B4˙=A1A2A3B1B2B3B48whereA1andA2arecompactsetsinRd1andRd2,A3isacompactsetinnonnegativerealsetR+0,andB1;B2;B3andB4aresetsoffunctionsthathaveboundedcontinuousderivativeson[0;1],andtheri-thderivativeisioldercontinuouson[0;1]for00)(y)PX;Z;;S(x;z;;s)9wherelog[(1ˇ)]=>x+(1)g0(s)+g1(s)(2.8)logit(ˇ)=>z+(1)h0(s)+h1(s)(2.9)andtheidentifyfunctionIX(x)takesvalue1ifx2X,takesvalue0otherwise,andPX;Z;;S(x;z;;s)isthejointdensityfunctionof(X;Z;;S).Thelog-likelihoodfunctioncanberepresentedasfollowsaftersettingPX;Z;;S(x;z;;s)asideintheestimationof,l(;w)=logPW(W=w;)=logˆˇ+(1ˇ)(1+)1I(y=0)(y)"(1ˇ)y+1()y1(y+1)(1+)y+1#I(y>0)(y)˙=I(y=0)(y)logˇ+(1ˇ)(1+)1+I(y>0)(y)log"(1ˇ)y+1()y1(y+1)(1+)y+1#(2.10)REMARK2.3.1.AccordingtotheofBiin(2.7),anyf2Biiscontinuouson[0,1],sofisbounded.Plus,boundedˇ,alwaysbelongstoclosedsetsAˇandA.Accordingto(2.10),givenW=w,themappingl:AˇA3A!Risanelementaryfunctionwithrespectto(ˇ;;),soliscontinuouswithrespectto(ˇ;)onAˇA3A.Ontheotherhand,accordingto(2.8)and(2.9),functionsˇandarecontinuouswithrespectto(;;g0;g1;h0;h1)onA1A2B1B2B3B4.Becausethecomposition10ofcontinuousfunctionsiscontinuous,liscontinuouswithrespecttoonSupposefW=(w1;w2;:::;wn)2narenindependentrandomsamples,andthesamplesizeoffWjandthesamplesizeoffWjaren0,n1,respectively.Itistruethatn=n0+n1.Let0=s0,˝n;2=(˝1;n;2;˝2;n;2;:::;˝N1;n;2)>,˝n;3=(˝1;n;3;˝2;n;3;:::;˝N0;n;3)>,11˝n;4=(˝1;n;4;˝2;n;4;:::;˝N1;n;4)>aregivenbyBn;1=8<:gn;0:gn;0=N0Xi=1˝i;n;1˚i;max1iN0˝i;n;1M1;29=;Bn;2=8<:gn;1:gn;1=N1Xi=1˝i;n;2'i;max1iN1˝i;n;2M2;29=;Bn;3=8<:hn;0:hn;0=N0Xi=1˝i;n;3˚i;max1iN0˝i;n;3M3;29=;Bn;4=8<:hn;1:hn;1=N1Xi=1˝i;n;4'i;max1iN1˝i;n;4M4;29=;whereM1;2;M2;2;M3;2;M4;2areknownconstants.Wedenotetheproductspacen,A1A2A3Bn;1Bn;2Bn;3Bn;4dependingonn.REMARK2.3.3.Asppartitionof[0;1]determinesthenumberofknotswith-outcontaining0and1.Giventhepartitionof[0,1]andl,thesplinebasisfunctions˚1;˚2;:::;˚N0and'1;'2;:::;'N1on[0,1]aredeterminedbytherecursionformula,likeDeBoor'salgorithm(DeBoor,1972).Sp,DeBoor'salgorithmstates,˚i;1(u)takesvalue1ifu2[ui;ui+1)andtakesvalue0otherwise;then,forany1X+(1g0g0)+g1g1)o2+ETn()>Z+(1h0h0)+h1h1)o2+jj2(2.11)Corollary1.GiventheofBi,IcanchoosethespeBn;isuchthatnbymodifyingthespeMi;2.Corollary2.Forany=(>;>;g0;g1;h0;h1)>2,thereexistsamappingn:!nsuchthatn=(>;>;gn;0;gn;1;hn;0;hn;1)>2n(2.12)andˆ(;n)!0asn!1.REMARK2.3.4.Wecanusethenotationn,n=(n;n;n;gn;0;gn;1;hn;0;hn;1)>wheren=;n=andn=becausethemappingnonlyfunctionsonnonparametriccomponentswhilemaintainingtheparametriccomponents.Forsimplicity,Iremainthenotationin(2.12).nisadimensionalspacebelongingtoRd1+d2+1+2N0+2N1,thenthesievemethodistoapproximatethespacebyusingaseriesofspacesfng1n=1,calledasthesievespacesofIngeneral,itdoesnotrequirenisasubsetof,butinmostcases,n(SongandXue,2000).Inthisthesis,Icanprovethatitisthesubsetofbyusing13thepropertiesofBsplinebasisfunctionsinCorollary1.Corollary2motivatesustoselectnasasievespaceofFurthermore,letPndenotetheempiricalmeasureonsamplespaceLetLn(;fW),Pn(l(;fW))=1nPni=1l(;wi)betheempiricalobjectivefunction,then^n,(^;^;^^gn;0;^gn;1;^hn;0;^hn;1)>=argsup2nLn(;fW)iscalledasthesieveestimatorforT.AftersimilaroperatinginREMARK2.3.1,Lniscontinuouswithrespecttoonn.Ontheotherhand,nisaboundaryclosedset,therefore,^nmustexist.14Chapter3AsymptoticPropertiesandBootstrapHypothesisTestThissectionprovidesthegeneralassumptionsusedthroughthisthesis,asymptoticproper-tiesofsieveestimators,andthetestforthenonparametriccomponents.InadditiontoecommonAssumptionsC1-C5usedinthelastsection,emoreadditionalmildAssump-tionsA1-A5establishedforthestudyofasymptoticpropertiesofsieveMLE.Spe,thesieveestimate^nisstrongconsistent,itconvergestothetrueparameteratanoptimalrateOpnr1+2r,theasymptoticalvarianceofnonparametriccomponentscanbeobtainedbytheestimatesofHessianmatrixinanumericalway,theasymptoticalvarianceofparamet-riccomponentscanbedeterminedbytheestimatedFisherinformationmatrixinaclosedform,ortheestimatedHessianmatrix.Attheendofthissection,IbuildtwostatisticsandconductabootstraphypothesistestfornonparametriccomponentsIaminterestedin.3.1AsymptoticPropertiesofSieveEstimatorBeforederivingtheasymptoticresultsinthisthesis,Isummarizesomenecessaryassumptionshere.Thesimilarassumptionsareheldfortheproportionaloddsregressionmodelwithintervalcensoring(HuangandRossini,1997),thesemiparametricregressionmodelwithcensoreddata(Xueetal.,2004),thesemiparametricZIPmodel(Lametal.,2006),theCox15modelwithintervalcensoreddata(Zhangetal.,2010),andthedoublysemiparametricZIPmodel(Heetal.,2010).ASSUMPTIONC1:TheuniquetruevalueT2thatis,T2A1;T2A2;T2A3;gT;02B1;gT;12B2;hT;02B3andhT;12B4.ASSUMPTIONC2:A1andA2arecompactsetsinRd1andRd2,respectively,andA3isacompactsubsetofthenon-negativerealnumbersetR+0.ASSUMPTIONC3:AXandAZarebounded,thatis,thereexistconstantsM1;1andM2;1suchthatP(kXk2M1;1)=1,P(kZk2M2;1)=1,whiletheirdiametersMd1;1andMd2;1suchthatkXXk2Md1;1,kZZk2Md2;1foranyX;X2AX,Z;Z2AZ.ASSUMPTIONC4:ThejointdensityfunctionPX;Z;;S(x;z;;s)doesnotdependontheunknownparameter.ASSUMPTIONC5:Giventhepartitionssjm0+1j=0andtjm1+1j=0of[0,1],max1jm0+1fsjsj1gCnk0andmax1jm1+1ftjtj1gCnk1forsomeconstantCand0i>0,andETh(ZET(ZjS))(ZET(ZjS))>i>0ASSUMPTIONA4:ThejointdensityfunctionPX;Z;;S(x;z;;s)issecondordercontinuouslytiablewithrespecttoSwithaboundedderivative.ASSUMPTIONA5:Restrictthepartitionssuchthatmin1jm0+1fsjsj1g=O(nk)andmin1jm1+1ftjtj1g=O(nk),wherekk1k2when15=I1(T)pnPnel(T;W)+op(1)d!N(0;I1(T))whereI1(T)=ETelel>>0istheFisherinformationmatrix,andelisthescorefunctionof(;;).REMARK3.1.4.I1(T)>0guaranteesthatthelikelihoodfunctionachievesmaximumatT.ItisextremelytrickytoderiveboththetscorefunctionandtheFisherinformationmatrixinaclosedformduetonon-covariateparameterandmorethantwononparametric18components,soIignoretheproofs.ForthespformulasofthetscorefunctionandtheFisherinformationmatrix,pleaserefertotheirarticles(Xueetal.,2004;Heetal.,2010;Huang,1999;Ma,2009;Sasieni,1992).Forrealdataanalysis,analternativewaytocomputeasymptoticvariancebyHessianmatrixinanumericalway.Insummary,Ireviewthesieveestimatorasthefollowingfacts:(1)GiventhedatafW2n,ofall,IchooseappropriatecompactsetsA1,A2,A3,andclassesoffunctionsB1,B2,B3andB4withspricorrespondingtoBi(likeri=1;2;3;:::fori=1;2;3;4)suchthattheirproductspacecoverstheunique(interiorpoint)truevalueT(byAssumptionsC1andA1),denotingr=minfr1;r2;r3;r4g;(2)Givenr,toarchivetheoptimalconvergencerate,letk=k0=k1be11+2r(byCorollary1,Theorem2,AssumptionC5),andkisdetermined(byAssumptionA5).Inparticular,ifr=2,thenk=15,andanyksuchthat15k25,andifr=3,thenk=17,andanyksuchthat17k27.Inthisthesis,Ichoosek=kforr=2;3;(3)Givenkandsamplesizesn0andn1,thepartitionsaregivenbythenumberofknotswithoutcontaining0and1m0=jnk0kandm1=jnk1k,respectively,whereb:cistheroundfunction(Zhangetal.,2010).Letsjm0+1j=0andtjm0+1j=0denotethepartitionson[0,1]withcontaining0and1,respectively.GivenBsplinebasisfunctionsofl-thdegree,forexample,cubicsplinebasisfunctionshave3degrees,N0=m0+1+landN1=m1+1+landBsplinebasisfunctionsn˚1;˚2;:::;˚N0oandn'1;'2;:::;'N1oaredetermined(byREMARK2.3.3);(4)GiventheBsplinebasisfunctionsn˚1;˚2;:::;˚N0oandn'1;'2;:::;'N1o,IchooseMi;2min0jrinf(j)1ocorrespondingtoBiin(2.7),thenBn;iaredetermined.Fur-thermore,nisdetermined(byCorollary1);(5)Givenadistanceˆon(by(2.11)),foranypoint2Icanapointn2n19correspondingtosuchthatˆ(;n)!0asn!1(byCorollary1).Inotherwords,forthetruevalueT2thereareaseriesoffng1n=1suchthatˆ(T;n)!0asn!1;(6)Inordertoestimaten,IsearchthemaximumofLn(;fW)onn.Theestimator^n=argsup2nLn(;fW)mustexist.Ontheonehand,TisthemaximizerofLn(;fW)almostsurelyunderPT(byREMARK3.1.4),ontheotherhand,ˆ(^n;T)!0almostsurelyunderPT(byTheorem1);(7)Attheend,parametriccomponentsandnonparametriccomponentsareseparablefromthedistance(byAssumptionA3andTheorem1).3.2BootstrapHypothesisTestForrealdataanalysis,Imaybeinterestingintestingiftwocertainnonparametriccompo-nentsaretlyt,ageneralnullhypothesiscanbeformulatedasH0:u(s)v(s);s2[S;S]whereu(s)andv(s)areanytwoofg0(s);g1(s);h0(s);h1(s)thatarespbytheinvesti-gators,and[S;S]arecommonboundariesforbothu(s)andv(s).Let^u(s)and^v(s)denotetheestimateofu(s)andv(s)obtainedbyusingthesieveestimator,respectively.AteststatisticT2basedonL2normisproposedby(Huetal.,2012),T22=Zs2[S;S]ku(s)v(s)k22dsThisintegralcanbeobtainedbyusingMonteCarlointegration.Becauseitiscomputingintensive,inordertospeedup,IcanuseL1norminsteadofL2norm,theT1statisticis20givenbyT1=sups2[S;S]ju(s)v(s)j.Generally,itrequiresthespsamplingdistributionofT2orT1tocalculatethep-value,butIhardlyderivetheclosedformduetothecomplicationinestimatingthejointsamplingdistribution.Thus,analternativemethodtocomputep-valueisbyconductingbootstrapapproachtoapproximatethenulldistributionofT2orT1(Huetal.,2012).Toinvestigatetheconvergenceandasymptoticnormalityforbootstrapstatistic,itre-quirestheexistenceofanEdgeworthexpansionforitsdistribution(Hall,2013),whichisbeyondthefocusofthisthesis.SoIskipthetheoreticaldevelopment,andmoredetailswillbeforthcominginthenextsection.21Chapter4NumericalResultsWeillustratetheuseofthesieveestimatortoevaluateperformanceofnonparametriccom-ponentsinsemiparametricZINBmarginalmeanmodel.Beforeapplyingtorealdata,simu-lationstudiesareconductedtodemonstratetheimportanceofthenonparametricmodeling.Then,theresultfromparametricZIPandZINBmodelsmotivatesustofocusonsemipara-metricZINBmarginalmeanmodel.Finally,IapplythesieveestimatortodentalcariesdatainasemiparametricZINBmodelandconductBootstraphypothesistestforcomparingthenonparametriccomponentsg0andg1thatIaminterestedin.4.1SimulationStudiesInordertoshowtheadvantageofusinganonparametriccomponentinasemiparametricZINBmodel,IconductMonteCarlosimulations.Wegeneratedatafromthefollowingsemiparametricmodellog[(1ˇ)]=1X1+2X2+3X3+(1g0(S)+g1(S)logˇ1ˇ=1Z1+2Z2+(1h0(S)+h1(S)whereX1,X2,X3,Z1,Z2,andSareindependentlydrawnfromthebinomialdistributionB(1;0:5),theuniformdistributionon[0;2],thenormaldistributionN(1;2),theuniform22distributionon[0;1],thenormaldistributionN(0;1),thebinomialdistributionB(1;0:5),andtheuniformdistributionon[0;1],respectively,withtheregressionparameters1=0:5,2=1,3=0:5,1=1:5,2=0:5,andthenonparametriccomponentsg0(S)=sin(ˇS),g1(S)=2S2,h0(S)=pS,andh1(S)=exp(2S+1).Inthemainmodel(2.1),=1andYisgeneratedfromZINB(ˇ;).Thesamplesizesn=500;1000;2000arechosen.Toinvestigatewhethernonparametricapproachisappropriateinabovesemiparametricmodel,thefollowingfourworkingmodelsareusedtothedata.Model1:allg0(S),g1(S),h0(S)andh1(S)aremodelednonparametricallyandesti-matedbysieveestimator.Model2:allg0(S),g1(S),h0(S)andh1(S)aremodeledlinearlyandestimatedbyclassicalapproach.Model3:bothg0(S)andg1(S)aremodeledlinearlyandestimatedbyclassicalapproach,andbothh0(S)andh1(S)aremodelednonparametricallyandestimatedbysieveestimator.Model4:bothg0(S)andg1(S)aremodelednonparametricallyandestimatedbysieveestimator,andbothh0(S)andh1(S)aremodeledlinearlyandestimatedbyclassicalap-proach.When3-degree(cubic)splinesbasisfunctionsareusedtoapproximateanonparametriccomponentinabovemodels,Iselecttheuniformknotson[0;1].Assumethesmoothingparameterrbe2,andbothnandrdeterminetheoptimalconvergencerate.Thenumberofknots,m0andm1,canbechosenbytheoptimalconvergencerate(Zhangetal.,2010)orAIC(Lametal.,2006;Heetal.,2010).MonteCarlosamplesizeissetas1000.Table4.1presentstherelativebias(RB),meansquareerror(MSE)andstandardde-viation(SE)ofallparametriccomponentsinModel1.Table4.2presentstheintegrated23Table4.1:EstimatesofdimensionalparametersinworkingModel1samplesizen=500samplesizen=1000samplesizen=2000ParameterRBMSESERBMSESERBMSESE10.0030.0240.005-0.0040.0120.0040.0030.0060.00220.0010.0190.004-0.0010.0090.0030.0050.0050.00230.0020.0020.001-0.0050.0010.001-0.0010.0010.00110.0160.1470.0120.0140.0650.0080.0130.0320.00620.0340.0150.0040.0110.0060.0020.0120.0030.002-0.1170.0320.004-0.0590.0140.003-0.0270.0060.002Table4.2:EstimatesofdimensionalparametersinworkingModel1Samplesizesg0(s)g1(s)h0(s)h1(s)5000.0480.0870.0460.04610000.0240.0330.0190.02420000.0190.0190.0130.014MSEofallnonparametriccomponentsinModel1.Figure4.1showthattheestimationsofnonparametriccomponentsinallmodelswhensamplesizeis2000.ThesieveestimatorsinModel1cancapturetheshapesoftruefunctionsreasonablywhileothermodelscannot.Bothresultsdemonstratesthatthemodelusingsieveapproachforallnonparametriccomponentsisbetterthanlessrestrictiveones.4.2RealData:Mouth-LevelIndicesinCariesResearchInordertoevaluatedentalcariesseverityinlow-incomeAfricanAmericanfamilies,amul-tilevelapproachwasdesignedandconductedinDetroit,Michigan(Tellezetal.,2006).Focusingondentalcariesdata,thisdatasetcontains874children'soralhealthinfor-mation.Thecovariatesofinterestincludethestandardizedcaregiver'soralhygieneindex,denotedasX1;Z1andrangedfrom-2.47to4.45;thechild'sagegroup,letdenotethis(a)ThenonparametricofDASIunderg0(S)insimulationstudy(b)ThenonparametricofDASIunderg1(S)insimulationstudy(c)ThenonparametricofDASIunderh0(S)insimulationstudy(d)ThenonparametricofDASIunderh1(S)insimulationstudyFigure4.1:Theestimatesofnonparametriccomponentsinsimulationstudybinaryindicatortakingvalue1(392,45%)ifthechild'sageislessthan2andtakingvalue0(482,55%)otherwise;thecaregiver'semploymentstatus,letX2denotethisbinaryindicatortakingvalue0(344,39%)ifthecaregiverhasnojobandtakingvalue1(530,61%)otherwise;thechild'sstandardizedsugarintake,denotedasSandrangedfrom-1.19to5.42.LetYdenotetheresponsevariable,DMFS(numberofdecayed,missingandtoothsurfaces)indices,representingthecumulativeseverityoftoothdecayforeachsurveyedchild.ThehistogramofY,thenegativebinomialdistributionandPoissondistributiononthedataareplottedinFigure4.2.Weencounteredalargeproportionofzerocounts,andthissituationmotivesustoconsidermodel,likenegativebinomialmodelinwhichIthezerocountsaregeneratedfromtwosources(Todemetal.,2012b;Caoetal.,2014).Therefore,IpostulatethatthedistributionofYisazero-negative25Figure4.2:ThehistogramofDMFSindicesandNBmodelandPoissonmodelbinomialmodelwiththeprobabilityofnon-zerocountsˇandthemeanofunderlyingnegativebinomialdistributionrelatedtocovariatesasfollows,log(ˇ)=1X1+2X2+(1g0(S)+g1(S)(4.1)logit(ˇ)=1Z1+(1h0(S)+h1(S)whereE(Y)=ˇstandsforthemarginalmeanofZINBmodelafteradjustingbyageofeachchildandthesugarintakeSenterstheZINBmodelnonparametrically.Weusetheuniformpartitionof[1:19;5:42],assumetheunknownfunctionsbe1-thor2-th(r=1;2)derivativein[1:19;5:42],choosenormalizeduniformBsplinesbasisfunctionsof2-degree(l=2),andlettheconvergenceratebetheoptimalrate.ThenumberofknotsischosenbyAIC(2ormoreknotsarebetter).TheestimatesandstandarderrorsaresummarizedinTable4.3andtheestimatesofg0andg1inthemodelwith3knotsare26Table4.3:TheestimatesandstandarddeviationsofdimensionalparametersParameter2knots3knots4knots6knots9knots10.145(0.050)0.147(0.050)0.147(0.050)0.151(0.050)0.155(0.050)20.307(0.109)0.310(0.110)0.311(0.109)0.310(0.109)0.311(0.111)10.578(0.215)0.580(0.213)0.509(0.213)0.516(0.213)0.559(0.217)log()-0.051(0.112)-0.061(0.111)-0.072(0.111)-0.086(0.110)-0.088(0.111)AIC13726.813737.11374913770.413803.4Figure4.3:ThenonparametricofDASIunderg0(S)andg1(S)inrealdatastudyplottedinFig4.3.Weareinterestedinthenonparametricofsugarintakeonthemarginalmeanofresponses,therefore,Ifocusonthenonparametriccomponentsg0(S)andg1(S)forS2[S;S],whereSandSarecommonboundaries.Accordingtotheg0(S)referstothenonparametrictofgroupinwhichchild'sageislargerthan2(correspondingto=0),andg1(S)referstothenonparametricofgroupinwhichchild'sageislessthan2(correspondingto=1).Figure4.3showsthedailyamountofsugarintakehardlyonthemarginalmeanofresponseswithrespecttog0(S),whileintensivelydoes27impactonthemarginalmeanofresponseswithrespecttog1(S).Inotherwords,forchildrenwhoareolderthan2yearsold,thedailyamountofsugarintakealmostdoesnotimpactontheirmarginalmeanofDMFSindices,whileforchildrenwhoareyoungerthan2yearsold,thedailyamountofsugarintakedoesimpactontheirmarginalmeanofDMFSindices.ItseemsthatonlythemarginalmeanofDMFSindicesofchildrenagedlessthan2dodependontheirdailyamountofsugarintake,andtheremightbesomefactorsratherthanDASIwillthemarginalmeanofDMFSindicesofchildrenagedlargerthan2.Wearealsointerestedinwhetherthereistbetweennonparametriccomponentsg0(S)andg1(S),soIconductastatisticaltestforthenullhypothesisH0:g0(s)g1(s);s2[S;S]:FollowingthediscussioninSection3.2,IusedtwoteststatisticsT2andT1basedonL2normandL1norm,respectively,givenbyT22=Zs2[S;S]kg0(s)g1(s)k22dsT1=sups2[S;S]jg0(s)g1(s)jInordertocomputethep-valueofbothstatistics,itrequiresustoconductbootstrapapproachtoapproximatedthenulldistributionofthesestatistics.Inotherwords,Ineedtore-samplethedatawithreplacementundernullhypothesis.Sp,ifthenullhypothesis28istrue,theformulain(4.1)isequivalenttolog(ˇ)=1X1+2X2+(1g0(S)+g1(S)=1X1+2X2+(1g0(S)+g0(S)=1X1+2X2+g0(S)whichimplieslog(ˇ)isindependentfromvariableTherefore,Isetthebootstrapsamplesizeasthesameastheoriginalsamplesizen,re-samplerandomlynobservationsfromdatafWwithreplacement,re-arrangeeachbootvalueforg0andg1(notforh0andh1)inbootstrapsamplesbyoriginalproportionsboot=1(45%),boot=0(55%))ratherthanitsoriginalvalue,forexample,thebootofbn45%cobservationsinbootstrapsamplesarere-arrangedas1,thentherestofbootofobservationsinbootstrapsamplesarere-arrangedas0,whereb:cistheroundfunction.Theothervariablesinbootstrapsamplesremainthesameasoriginalones.Wecanconduct1000timesbootstrapre-samplings.Ineachbootstrapre-sampling,Isamplenobservationsfromoriginaldata,re-arrangebootvaluesasabove,estimate^g0(S)and^g1(S)usingsieveestimator,andeventually,compute^T2;bootand^T1;bootusingMonteCarlointegration.Finally,althoughIdonotknowthesamplingdistributionsofT2andT1statistics,thebootstrapsampleletsusestimatethep-valuesofbothT2andT1statisticsbyp-value2=#n^T2;boot>^T2;originalo1000p-value1=#n^T1;boot>^T1;originalo1000where#f:gisthecountfunction,^T2;originaland^T1;originalaretheestimatedstatisticsby29Table4.4:TheresultofbootstrapsamplingStatisticsObservedvalue#fbootstrap>observedgp-valueT210.73183240.024T111.48298720.072originalsamples.Table4.4summariesaboveresult,Figure4.4showstheestimatedsamplingdistributionsbybootstrapmethod.30(a)ThehistogramofestimatedT2(b)ThehistogramofestimatedT1Figure4.4:Thehistogramofbothestimatedstatisticsbybootstrapmethod31Chapter5DiscussionInthisthesis,asemiparametricmarginalmeanmodelisproposedinsection2.1,andBsplinesbasedsieveestimatorareusedtoestimateboththeparametriccompo-nentsandnonparametriccomponents.Ialsoshowedthatsieveestimatorforthevectorofdimensionalparametriccomponentsisstrongconsistentandasymptoticallynormallydistributedandet,givenbythesptscorefunctionandtheFisherinfor-mationmatrix.Furthermore,abootstraphypothesistestingisintroducedtoanytwoofnonparametriccomponents.Thesimulationstudiesdemonstratedthattheproposedmodelhashighlysatisfactoryperformance.Givenappropriatetunningparameterslikesmoothingparameterr,growthratekandthenumberofknots(withoutcontainingtwoends)m0andm1,estimationofthenonparametriccomponenthasshowntobehighlysatisfactoryinthattheoverallshapeofg0(S),g1(S),h0(S)andh1(S)canbecapturedreasonablywell.Themodelcanbeextendedbyallowingapenalizedlikelihoodforthesmoothingparameters.Forcariesresearch,thesemiparametricmarginalmeanmodelisappliedtotheDetroitDentalCariesStudy.Thismodeldoesprovideareasonablerepresentationofdatafromahomogeneouspopulation,butitisunknownwhysomechildrenfromlow-incomefamilieswouldbeconsideredimmunetodentalcariesbasedonthefactthattheydistributeasDirac(purezero)ratherthannegativebinomial(Todemetal.,2016).YoungchildrenmayhavetoralhealthoutcomeslikeDMFSduetosocio-economiclevelsoffami-lies(Nanayakkaraetal.,2013).Somefactorsarepositivelyassociatedwithchild'soralhealth,32likehavingadentalhome,havingcaregiverswithhigheducationandlivinginacommunity(Chietal.,2013).Inthiscariesresearch,childraisedbyunemployedparentsexperiencesmorecariesaccordingtoestimatesofparametriccomponents,whichtheconclusionofasystematicreview(Kumaretal.,2015).Accordingtononparametricin-ofthedailyamountsugarintake,childrenbelongingtotialmembershipofagegrouparefromdentalcariestly.Modelshowsyoungerchildrenhavemorepronouncedthantheiroldercounterparts,andthereexistsavalue(52.32grams)ofsugarintake,abovewhichsugarintakeisdetrimentalforyoungerchildren.Itisalsounclearwhythevalueofsugarintakeexistsandwhyitexistsonlyforyoungerchildren.Perhapschildrenwilldevelopgraduallytobe"immune"todentalcariesastheygrowup.Basedonthesenewinterventionstrategiesshouldconsiderationofagegroup,andinterventionstargetingchildrenagedbelow2arerequiredtoassociatewiththevalueofsugarintake.33APPENDIX341(Envelope).SupposeFisaclassoffunctionsinLp(P),thatisF=f:RjfjpdP<1.CalleachconstantCsuchthatkfkpCforeveryfinF,anenvelopeforF.2(CoveringNumber).SupposeFisaclassoffunctionsinLp(P).Foreach>0thecoveringnumberNF;Lp(P)asthesmallestnumbermforwhichthereexistfunctionsg1;g2;:::;gmsuchthatmin1jmkfgjkpforeachfinF.ProofofTheoremsLemma1.Foranyf2Bi,thereexistsafunctionfn2Bn;iandaconstantCsuchthatsup0s1jfn(s)f(s)jCnrkwheretheconstantrandtheconstantkareknown,andkdependingonthepartition.Proof.PleaserefertoTheorem12.7(P491)in(Schumaker,1981).Corollary1.GiventheofBi,IcanchoosethespeBn;isuchthatnbymodifyingthespeMi;2.Proof.PleaserefertosummaryinSection3.1.Corollary2.UndertheAssumptionC4,forany=(;;g0;g1;h0;h1)2,thereexistsn=(;;gn;0;gn;1;hn;0;hn;1)2A1A2A3Bn;1Bn;2Bn;3Bn;435suchthatˆ(;n)!0asn!1.Proof.Forany2thenonparametriccomponentsg0,g1,h0andh1belongtoB1,B2,B3andB4,respectively.UsingLemma1,thereexistgn;02Bn;1;gn;12Bn;2;hn;02Bn;3;hn;12Bn;4andconstantsC1;C2;C3;C4suchthatsup0s1gn;0(s)g0(s)C1nr1k0;sup0s1gn;1(s)g1(s)C2nr2k1sup0s1hn;0(s)h0(s)C3nr3k0;sup0s1hn;1(s)h1(s)C4nr4k1wherek0andk1areundertheAssumptionC4.LetC2=maxfC1;C2;C3;C4g,r=36minfr1;r2;r3;r4g1,andk=minfk0;k1g,andgivenW,thenˆ2(;n)=ETn()>X+(1gn;0g0)+gn;1g1)o2+ETn()>Z+(1hn;0h0)+hn;1h1)o2+jj2=ET(1gn;0g0)+gn;1g1)2+ET(1hn;0h0)+hn;1h1)2=ETn[(1gn;0g0)]2+gn;1g1)]2o+ETn[(1hn;0h0)]2+hn;1h1)]2o=Zn[(1gn;0g0)]2+gn;1g1)]2oP;S(;s)d(;s)+Zn[(1hn;0h0)]2+hn;1h1)]2oP;S(;s)d(;s)=Zgn;0g02dPS+Zgn;1g12dPS+Zhn;0h02dPS+Zhn;1h12dPSZgn;0g021dPS+Zgn;1g121dPS+Zhn;0h021dPS+Zhn;1h121dPSgn;0g021+gn;1g121+hn;0h021+hn;1h121C1nr1k02+C2nr2k12+C3nr3k02+C4nr4k124C2nrk2Therefore,ˆ(;n)Cnrk!0asn!1.Lemma2.AssumeA1beheld,ETh(XET(XjS))(XET(XjS))>ihasaminimum37eigenvaluex;min,andETh(ZET(ZjS))(ZET(ZjS))>ihasaminimumeigenvaluez;min.Then,forany;2,Ihavekk1px;minˆ(;);kk1pz;minˆ(;);jjˆ(;)kg0g0k2vuut2+2M21;1x;minˆ(;);kg1g1k2vuut2+2M21;1x;minˆ(;)kh0h0k2vuut2+2M21;1z;minˆ(;);kh1h1k2vuut2+2M21;1z;minˆ(;)Proof.Forany;2ˆ2(;)=ETn()>X+(1g0g0)+g1g1)o2+ETn()>Z+(1h0h0)+h1h1)o2+jj2ETn()>X+(1g0g0)+g1g1)o2=ETˆ()>(XET(XjS))+()>ET(XjS)+(1g0g0)+g1g1)˙2LetJ(S)denote()>ET(XjS)+(1g0g0)+g1g1),IhaveET(XET(XjS))J(S)=ETXJ(S)ETET(XjS)J(S)=ETXJ(S)ETET(XJ(S)jS)=ETXJ(S)ETXJ(S)=038Sotheinteractiontermis0,Ihaveˆ2(;)ETn()>(XET(XjS))+J(S)o2=ETn()>(XET(XjS))o2+J2(S)=ETn()>(XET(XjS))o2+ETn()>ET(XjS)+(1g0g0)+g1g1)o2ETn()>(XET(XjS))o2=()>ETh(XET(XjS))(XET(XjS))>i()x;minkk2Therefore,Ihave,kk1qx;minˆ(;),kk1qz;minˆ(;),andbasedontheofˆ,jjˆ(;).Thenfocusonthenonparametriccomponent,ET[(1g0g0)+g1g1)]2=ETh()>X+(1g0g0)+g1g1)()>Xi22ETh()>x+(1g0g0)+g1g1)i2+2ETh()>Xi22ˆ2(;)+2M21;1kk2 2+2M21;1x;min!ˆ2(;)39Ontheotherhand,ET[(1g0g0)+g1g1)]2=Z[(1g0g0)+g1g1)]2P;S(;s)d(;s)=Z(g0g0)2PS(s)d(s)+Z(g1g1)2PS(s)d(s)=kg0g0k22+kg1g1k22Followingthepreviousdiscussion,kg0g0k2s2+2M21;1x;minˆ(;).Similarly,kg1g1k2s2+2M21;1x;minˆ(;),kh0h0k2s2+2M22;1z;minˆ(;),andkh1h1k2s2+2M22;1z;minˆ(;).Lemma3.Assumegiven;2n,holdtheAssumptionsXXX,thenthereexistaconstantM3suchthatjl(;w)l(;w)jM3(kk2+kk2+kgn;0gn;0k1+kgn;1gn;1k1+khn;0hn;0k1+khn;1hn;1k1)Proof.Let˘anddenotethejointfunctionswithrespectto(;w),givenby˘(;w)=>X+(1g0(S)+g1(S)(;w)=>Z+(1h0(S)+h1(S)40andletQ1andQ2denotethejointfunctionswithrespectto(˘;;),givenbyQ1(˘;;)=I(y=0)logˇ+(1ˇ)(1+)1Q2(˘;;)=I(y>0)log264(1ˇ)y+1()y1y+1)(1+)y+1375Computethedl(;w)l(;w)=Q1(˘;;)Q1(˘;;)+Q2(˘;;)Q2(˘;;)UsingTaylor'sseriesexpansion,thereexists(˘;;)satisfyingthatQ1(˘;;)Q1(˘;;)=@@˘Q1(˘;;)(˘˘)+@@Q1(˘;;)()+@@kQ1(˘;;)()where@@˘Q1(˘;;),@@Q1(˘;;)and@@Q1(˘;;)areboundedundertheAssumptionC2,andwithoutlossgenerality,letM1;3denotetheircommonsupremum.Therefore,jQ1(˘;;)Q1(˘;;)j=j@@˘Q1(˘;;)(˘˘)+@@Q1(˘;;)()+@@kQ1(˘;;)()jk˘˘k@@˘Q1(˘;;)+kk@@Q1(˘;;)+jj@@Q1(˘;;)M1;3(k˘˘k+kk+jj)41Therefore,thereexistsM1;3suchthatjQ1(˘;;)Q1(˘;;)jM1;3kk2+kk2+jj+kgn;0gn;0k1+kgn;1gn;1k1+khn;0hn;0k1+khn;1hn;1k1ThesimilaroperationisappliedtoQ2(˘;;)Q2(˘;;)andIhaveM2;3,letM3=maxnM1;3;M2;3o,then,jl(;w)l(;w)j=jQ1(˘;;)Q1(˘;;)+Q2(˘;;)Q2(˘;;)jjQ1(˘;;)Q1(˘;;)j+jQ2(˘;;)Q2(˘;;)jM1;3(kk2+kk2+jj+kgn;0gn;0k1+kgn;1gn;1k1+khn;0hn;0k1+khn;1hn;1k1)+M2;3(kk2+kk2+jj+kgn;0gn;0k1+kgn;1gn;1k1+khn;0hn;0k1+khn;1hn;1k1)M3(kk2+kk2+jj+kgn;0gn;0k1+kgn;1gn;1k1+khn;0hn;0k1+khn;1hn;1k1)Lemma4.Denotetwofunctionalclasses,G=fg(:)gandF=ff(:)g,wherefunctionftheLipschitzcondtionthatforanyf2F,thereexistsaconstantCsuchthatjf(g)f(g)jCjggjforanygandg2G.Then,foranyprobabilitymeasureP,the42coveringnumbersN(F;L2(P))andN(G;L2(P))ofFandGhavethepropertyN(F;L2(P))NC;G;L2(P)Proof.AccordingtothethereexistsaconstantCsuchthatjf(s)f(s)jCjssjforanyf(s);f(s)2F.Foreach>0,m=NC;G;L2(P)andthenthereexistg1;g2;:::;gmsuchthatminjggjL2(P)CforeachginG.Giveng2fg1;g2;:::;gmg,thereexistfgsatisfyingkfgfgkL2(P)=Z(fgfg)2dP12ZC2(gg)2dP12=CkggkL2(P)CC=foranygivenfg2F,whichimpliesfg1;fg2;:::;fgmcancoverF.Then,N(F;L2(P))m=NC;G;L2(P)Lemma5.Thecoveringnumberoftheclassn=ˆln;:)n2n˙N(n;L1)K1d1+d2+2N0+2N1+1whereKisaconstant.43Proof.Letgn;0=PN0i=1˝i;n;1˚i;gn;0=PN0i=1˝i;n;1˚i2Bn;1,thencomputesupsgn;0(s)gn;0(s)=supsN0Xi=1˝i;n;1˚i(s)N0Xi=1˝i;n;1˚i(s)=max1jN01supsN0Xi=1˝i;n;1˚i(s)N0Xi=1˝i;n;1˚i(s)[sj;sj+1]=max1jN01supsj+l+1Xi=j˝i;n;1˚i(s)j+l+1Xi=j˝i;n;1˚i(s)[sj;sj+1](l+1)max1iN0n˝i;n;1˝i;n;1o(l+1)˝n;1˝n;12therefore,usingLemma4,IhaveNBn;1;L1N l+1;(˝n;1:max1iN0˝i;n;1M1;2);k:k2!AccordingtoLemma4.1of(Pollard,1990),NBn;1;L1N l+1;(˝n;1:max1iN0˝i;n;1)M1;2;k:k2!D l+1;(˝n;1:max1iN0˝i;n;1)M1;2;k:k2! 32M1;2l+1!N0=6(l+1)M1;2N044wherethepackingnumberD(Pollard,1990)suchthatN(F)D(F).Similarly,IhaveNBn;1;L16(l+1)M1;2N0;NBn;2;L16(l+1)M2;2N1NBn;3;L16(l+1)M3;2N0;NBn;4;L16(l+1)M4;2N1Givenadistanceedonnbyed(;)=kk2+kk2+jj+kgn;0gn;0k1+kgn;1gn;1k1+khn;0hn;0k1+khn;1hn;1k1Then,applyingLemma3toPolland'ssection5,N(n;L1)NM3;n;edN7M3;A1;L2N7M3;A2;L2N7M3;A3;L1N7M3;Bn;1;L1N7M3;Bn;2;L1N7M3;Bn;3;L1N7M3;Bn;4;L1 21M3Md1;1!d1 21M3Md2;1!d2 21M3Md3;1!42M3(l+1)M1;2N042M3(l+1)M2;2N142M3(l+1)M3;2N042M3(l+1)M4;2N1=K1d1+d2+2N0+2N1+1whereKisaconstant.45Theorem1(StrongConsistency).SupposetheAssumptionsC1-C5hold,thenˆ^n;T!0almostsurelyunderPT.Moreover,iftheconditionA3ared,thenk^nTk!0;k^nTk!0;j^nTj!0;k^gn;0gT;0k2!0;k^gn;1gT;1k2!0;k^hn;0hT;0k2!0;k^hn;1hT;1k2!0almostsurelyunderPT.Proof.UsingLemma5andfollowingtheargumentsin(Xueetal.,2004).Theorem2(RateofConvergence).SupposetheAssumptionsC1-C5hold,thenˆ^n;T=Opmaxˆn1k2;nrkIfselectk=11+2rforr=1;2,thenˆ^n;Tachievestheoptimalnonparametricconver-gencerateOpnr1+2r.Proof.BecauseLn(;fW)isbounded,ittoverifythethreeconditionsintheorem1in(ShenandWong,1994)holdtrue.Followingtheauthors'notation,Ichecktheconditionsonebyone:(1)Similartotheargumentsin(Xueetal.,2004),theKullback-LeiblerinformationisgreaterthanthesquareoftheHellingerdistance(ShenandWong,1994),thenthereexists46aconstantCsatisfyinginfˆ(T)2nET[l(T;W)l(;W)]infˆ(T)2nCˆ2(;T)C2whereholdingtheconditionwith=1ofShen's.(2)CombiningLemma2andLemma3,thereexistsaconstantCsatisfyingsupˆ(T)2nV[l(T;W)l(;W)]supˆ(T)2nET[l(T;W)l(;W)]2Csupˆ(T)2nˆ2(;T)C2whereholdingthesecondconditionwith=1ofShen's.(3)ConsidertheentropyH(n;k:k1)=logN(n;k:k1),usingLemma5,thereexistsconstantsCandsatisfyingH(n;k:k1)logK+(d1+d2+2N0+2N1+1)log1Cnklog1whereholdingthethirdconditionwith2r0=k;r=0+ofShen's.47REFERENCES48REFERENCESNAtienza,JGarcJMMu~noz-Pichardo,andRVilla.Anapplicationofmixturedistributionsinmodelizationoflengthofhospitalstay.StatisticsinMedicine,27(9):1403{1420,2008.CSBlaum,BEFries,andMAFiatarone.Factorsassociatedwithlowbodymassindexandweightlossinnursinghomeresidents.TheJournalsofGerontologySeriesA:BiologicalSciencesandMedicalSciences,50(3):162{168,1995.GCao,WWHsu,andDTodem.Ascore-typetestforheterogeneityinmodelsinapopulation.Statisticsinmedicine,33(12):2103{2114,2014.DPCappelliandCCMobley.Preventioninclinicaloralhealthcare.ElsevierHealthSciences,2007.HChen.Convergenceratesforparametriccomponentsinapartlylinearmodel.TheAnnalsofStatistics,16(1):136{146,1988.DonaldLChi,KatharineCRossitch,andElizabethMBeeles.Developmentaldelaysanddentalcariesinlow-incomepreschoolersintheusa:apilotcross-sectionalstudyandpreliminaryexplanatorymodel.BMCoralhealth,13(1):1,2013.CDeBoor.Oncalculatingwithb-splines.JournalofApproximationTheory,6(1):50{62,1972.BADye,STan,VSmith,BGLewis,LKBarker,GThornton-Evans,PIEke,EDan-Aguilar,AMHorowitz,andCHLi.Trendsinoralhealthstatus:Unitedstates,1988-1994and1999-2004.Vitalandhealthstatistics.Series11,Datafromthenationalhealthsurvey,(248):1{92,2007.VTFarewellandDASprott.Theuseofamixturemodelintheanalysisofcountdata.Biometrics,pages1191{1194,1988.StuartGemanandChii-RueyHwang.Nonparametricmaximumlikelihoodestimationbythemethodofsieves.TheAnnalsofStatistics,pages401{414,1982.DGhosh.considerationsintheadditivehazardsmodelwithcurrentstatusdata.StatisticaNeerlandica,55(3):367{376,2001.MSGilthorpe,MFrydenberg,YCheng,andVBaelum.Modellingcountdatawithexcessivezeros:Theneedforclasspredictioninmodelsandtheissueofdatagenera-tioninchoosingbetweenandgenericmixturemodelsfordentalcariesdata.StatisticsinMedicine,28(28):3539{3553,2009.PHall.ThebootstrapandEdgeworthexpansion.SpringerScience&BusinessMedia,2013.WardleandHLiang.Partiallylinearmodels.Springer,2007.49XHe,HXue,andNZShi.Sievemaximumlikelihoodestimationfordoublysemiparametricpoissonmodels.JournalofMutivariateAnalysis,101:2026{2038,2010.BHu,LLi,XWang,andTGreene.Nonparametricmultistaterepresentationsofsurvivalandlongitudinaldatawithmeasurementerror.Statisticsinmedicine,31(21):2303{2317,2012.JHuang.testimationofthepartlylinearadditivecoxmodel.TheannalsofStatistics,27(5):1536{1563,1999.JHuangandAJRossini.Sieveestimationfortheproportional-oddsfailure-timeregressionmodelwithintervalcensoring.JournaloftheAmericanStatisticalAssociation,92(439):960{967,1997.KHur,DHedeker,WHenderson,SKhuri,andJDaley.Modelingclusteredcountdatawithexcesszerosinhealthcareoutcomesresearch.HealthServicesandOutcomesResearchMethodology,3(1):5{20,2002.HKleinandCEPalmer.Studiesondentalcaries:V.familialresemblanceinthecariesexperienceofsiblings.PublicHealthReports(1896-1970),pages1353{1364,1938.SanthoshKumar,JyothiTadakamadla,JeroenKroon,andNewellWJohnson.Impactofparent-relatedfactorsondentalcariesinthepermanentdentitionof6-to12-year-oldchildren:asystematicreview.Journalofdentistry,2015.KFLam,HXue,andYBCheung.Semiparametricanalysisofcountdata.Biometrics,62:996{1003,2006.DLambert.poissonregression,withanapplicationtodefectsinmanufacturing.Technometrics,34(1):1{14,1992.AHLee,MGracey,KWang,andKKWYau.Amodelingapproachtoanalyzepediatriclengthofstay.AnnalsofEpidemiology,15(9):673{677,2005.TLoeys,BMoerkerke,ODeSmet,andABuysse.Theanalysisofcountdata:Beyondzero-inpoissonregression.BritishJournalofMathematicalandStatisticalPsychology,65(1):163{180,2012.SMa.Curemodelwithcurrentstatusdata.StatisticaSinica,19(1):233{233,2009.MMinami,CELennert-Cody,WGao,andMRomn-Verdesoto.Modelingsharkbycatch:Thenegativebinomialregressionmodelwithsmoothing.FisheriesResearch,84(2):210{221,2007.JMullahy.Spandtestingofsomemocountdatamodels.JournalofEcono-metrics,33(3):341{365,1986.50VajiraNanayakkara,AndreRenzaho,BrianOldenburg,andLilaniEkanayake.Ethnicandsocio-economicdisparitiesinoralhealthoutcomesandqualityoflifeamongsrilankanpreschoolers:across-sectionalstudy.Internationaljournalforequityinhealth,12(1):1{9,2013.DPollard.Empiricalprocesses:theoryandapplications.InNSF-CBMSregionalconferenceseriesinprobabilityandstatistics,pages1{86.JSTOR,1990.MRidout,JHinde,andCDemetrio.Ascoretestfortestingapoissonregressionmodelagainstzero-negativebinomialalternatives.Biometrics,57:219{223,2001.CSRitchie,KJoshipura,RASilliman,BMiller,andCWDouglas.Oralhealthproblemsandtweightlossamongcommunity-dwellingolderadults.TheJournalsofGeron-tologySeriesA:BiologicalSciencesandMedicalSciences,55(7):366{371,2000.PSasieni.Non-orthogonalprojectionsandtheirapplicationtocalculatingtheinformationinapartlylinearcoxmodel.ScandinavianJournalofStatistics,pages215{233,1992.LSchumaker.Splinefunctions:basictheory.AWileyIntersciencePublication,1981.XShen.Onmethodsofsievesandpenalization.TheAnnalsofStatistics,pages2555{2591,1997.XShenandJShi.Likelihoodratioinferenceonparametricspace.ScienceinChinaSeriesAMathematics,34(5):584{594,2004.XShenandWWong.Convergencerateofsieveestimates.TheAnnalsofStatistics,pages580{615,1994.GShmueli,TPMinka,JBKadane,SBorle,andPBoatwright.Ausefuldistributionfordiscretedata:revivaloftheconway{maxwell{poissondistribution.JournaloftheRoyalStatisticalSociety:SeriesC(AppliedStatistics),54(1):127{142,2005.CHSinghandLLadusingh.Inpatientlengthofstay:amixturemodelinganalysis.TheEuropeanJournalofHealthEconomics,11(2):119{126,2010.LSongandHXue.Asymptoticpropertiesofakindofsievemle.TheAnnalsofStatistics,20(3):370{377,2000.PSpeckman.Kernelsmoothinginpartiallinearmodels.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),pages413{436,1988.DHSullivan,WMartin,NFlaxman,andJEHagen.Oralhealthproblemsandinvoluntaryweightlossinapopulationoffrailelderly.JournaloftheAmericangeriatricsSociety,41(7):725{731,1993.MTellez,WSohn,BABurt,andAIIsmail.Assessmentoftherelationshipbetweenneigh-borhoodcharacteristicsanddentalcariesseverityamonglow-incomeafrican-americans:Amultilevelapproach.Journalofpublichealthdentistry,66(1):30{36,2006.51DTodem.Statisticalmodelsfordentalcariesdata.INTECHOpenAccessPublisher,2012a.DTodem,WWH,andKMKim.Ontheofscoretestsforhomogeneityintwo-componentparametricmodelsfordiscretedata.Biometrics,68(3):975{982,2012b.DavidTodem,KyungMannKim,andWei-WenHsu.Marginalmeanmodelsforcountdata.Biometrics,2016.ZWang,SMa,andCYWang.Variableselectionforandoverdisperseddatawithapplicationtohealthcaredemandingermany.BiometricalJournal,57(5):867{884,2015.HXue,KFLam,andGLi.Sievemaximumlikelihoodestimatorforsemiparametricregres-sionmodelswithcurrentstatusdata.JournaloftheAmericanStatisticalAssociation,99(466):346{356,2004.KKWYau,KWang,andAHLee.negativebinomialmixedregressionmodelingofover-dispersedcountdatawithextrazeros.BiometricalJournal,45(4):437{452,2003.YZhang,LHua,andJHuang.Aspline-basedsemiparametricmaximumlikelihoodesti-mationmethodforthecoxmodelwithinterval-censoreddata.ScandinavianJournalofStatistics,37(2):338{354,2010.52