APPLIEDECONOMETRICSTUDIESINAIRQUALITYANDEDUCATIONByChristopherKhawandADISSERTATIONSubmittedtoMichiganStateUniversityinpartialoftherequirementsforthedegreeofEconomicsŠDoctorofPhilosophy2016ABSTRACTAPPLIEDECONOMETRICSTUDIESINAIRQUALITYANDEDUCATIONByChristopherKhawandThisdissertationiscomprisedofthreestandalonechapterslooselyorganizedaroundacausalinferencetheme.Itaddressesempiricalquestionsinenvironmentalqualityandeducation,whilealsoofferingsomemethodologicalinsightineachchapter.Chapter1isanempiricalpaperexploringthehealtheffectsofairpollutionusingthelarge-scalenaturalexperimentgeneratedbyU.S.Init,IapplyacombinationofforestandatmosphericdispersionmodelstoprovidepredictionsofwherepollutiongoesafteraIuseapanelinstrumentalvariablesframeworktoestimatetheimpactofincreasingparticu-latematterpollution(PM2.5)onmortalityandperinatalhealth.Increasedshort-termexposuretoPM2.5isassociatedwithbothincreasedmortalityamongelderlyandpoorerbirthoutcomes,andtoxicmetalsappeartoexplainmostoftheeffect.Chapter2,co-authoredwithcolleagueWeiLin,isaneconometrictheorypaperfocusedonthetwo-sampletwo-stageleastsquaresestimator(TS2SLS).Here,weaimtoagapintheliteratureonhowtwo-sampleinstrumentalvariablesestimatorsbehaveinsamples.Weshowclosed-formapproximations,simulationevidence,andempiricalexamplesofhowtheestimatorbehaves.Weofferrecommendationsforeconometricpractitionersusingtwo-sampleestimation.Finally,Chapter3attemptstoanswerthequestionofwhetherthereareabilitypeereffectsinhighschoolclassroomsandhowstrongtheyareacrosstheabilitydistribution.Ileverageaseriesofplaceboteststovalidatewhethertheestimatedpeereffectsrepresentcausaleffects.Iprovideaclosed-formexpressionfortheplaceboestimates,showingthattheyaredirectlypropor-tionaltotheamountofbiasintheircorrespondingpeereffectestimates.Ilargelyplausiblepeereffectsnonlinearinability,withhigh-performingstudentshavingthegreatestpositiveimpactonclassrooms'overallperformance.ToLydiafiPikachu-Yoshi-Kitty-Puppy-RaccoonflKhawandiiiACKNOWLEDGEMENTSThisdissertationrepresentsveyearsofhardworkandaslewofmistakesovercome.Here'salistofpeopleI'dliketothankfor,inonewayoranother,keepingthosemistakesfromsinkingtheship:MichaelBates,JulieHarris,andDanLitwokŠwithwhomIsharedcomradery,foosball,andMSUDairyStoreicecream.Wehaveallgazedintotheancientchickenboneinthe1st-yeargradstudentofandithasgazedintous.MycommitteeŠfortheirkindnessandpatiencethroughwildgoosechasesandlongspellsofnoprogress.GarySolontaughtmethefundamentalcausalinferenceskillsthathavesparkedanexcitingstarttomycareer;SorenAndersonwastrulyasecondadvisor,givingmeanincred-ibleamountofdetailedfeedbackandmoraleboostswhenthingswerenotgoingwell;andJeffWooldridgebestoweduponmenearlyalloftheusefuleconometricknowledgethatIhaveandanunwaveringcommittmenttoeconometrictruths(eveniftheytendtomostlybeinasymp-totia).Lastly,mywifeJenniferanddaughterLydiaŠtheyaretheChapter0ofthisdissertation,mylife'swork.ivTABLEOFCONTENTSLISTOFTABLES.......................................viiLISTOFFIGURES......................................ixChapter1.AirQuality,PerinatalHealth,andMortality:CausalEvidencefromWild-es...............................................11Introduction.........................................12Data.............................................62.1ModeledWAirPollution............................62.1.1WData.................................62.1.2DescriptionofWEmissionsandAirPollutionModeling.......72.1.3ModelingTools................................82.2BirthandMortalityData...............................102.3AmbientAirPollutionandWeatherData.......................103EconometricApproach...................................113.1StatisticalModel...................................113.2.....................................123.3TestingandControllingforEffectsfromMultiplePollutants............143.3.1ControllingforMultipleWPollutants.................143.3.2Pre-testingforOmittedPollutants......................164Results............................................184.1WEffectonAmbientAirQuality......................184.1.1FirstStage:WEffectonAmbientConcentrationsofPollutants...184.1.2PM2.5ChemicalCompositionbyWInstrument......204.2Short-termEffectsonMortality............................224.2.1Short-termEffectsofPM2.5onAll-CauseMortality............224.2.2HeterogeneousEffectsofPM2.5byChemicalComposition........244.2.3NonlinearEffectsofPM2.5.........................264.2.4Short-TermEffectsofPM2.5byCauseofDeath..............274.2.5LaggedandLeadShort-TermAssociationswithPM2.5...........294.3EffectsonInfantHealth................................315WExternalitiesandCurrentManagementPolicy..................346Conclusion.........................................36APPENDIX.........................................38REFERENCES.......................................74Chapter2.FiniteSamplePropertiesandEmpiricalApplicabilityofTwo-SampleTwo-StageLeastSquares......................................811Introduction.........................................812PropertiesoftheTS2SLSEstimator............................842.1Model.........................................842.2ofEstimators...............................872.3First-orderbiasapproximation............................90v2.4AsymptoticVarianceofTS2SLS...........................922.5TS2SLSUnderDataAvailabilityConstraints....................933SimulationEvidence....................................964Application.........................................974.1TS2SLSinPractice:SyntheticExamplefromAngristandEvans(1998)......974.2OtherConsiderationsforApplications........................1004.3ThePracticalImpactofSampleOverlapr......................1024.4ComputationofStandardErrors...........................1045Conclusion.........................................104APPENDIX.........................................106REFERENCES.......................................121Chapter3.EstimatingandValidatingNonlinearandHeterogeneousClassroomPeerEffects..............................................1241Introduction.........................................1241.1ThePeerEffectsLiterature..............................1261.2PeerEffectsModel..................................1271.3BiasesfromSortingandIdentifyingNonlinearvs.LinearEffects..........1291.4DataDescription...................................1311.4.1NorthCarolinaAdministrativeData.....................1311.4.2DeterminingCourseMembership......................1321.4.3TestScoresandAbility............................1332Results............................................1352.1Linearvs.Non-linearPeerEffectsEstimates.....................1352.2PlaceboTestsŒAlternateClassrooms........................1382.3NoteontheInterpretationofthePlaceboTest....................1412.4PolicyImplications:OptimalAbilityTracking....................1443Conclusion.........................................146APPENDIX.........................................148REFERENCES.......................................165viLISTOFTABLESTable1:AQSPM2.5andNLDASWeatherDescriptiveStatistics............44Table2:Monthly,County-LevelMortalityRate(per100,000)bySubgroupfromU.S.Death2004-2010..........................44Table3:Monthly,County-LevelMeanBirthOutcomesandRatesforBirthCohortsfromU.S.Birth2004-2010....................45Table4:FirstStageRegressionofPM2.5andRegressionsofCriteriaPollutantsonWInstrument...............................46Table5:RegressionsofHighlyToxicPM2.5SubspeciesonWPM2.5......47Table6:RegressionsofNon-MetallicPM2.5SubspeciesonWPM2.5......48Table7:PercentageofWPM2.5ExposureOutsideoftheStateofOrigin....49Table8:IVEstimates:PM2.5EffectsonAll-CauseMortality(byFixed-EffectsSpec-.....................................50Table9:IVEstimates:PM2.5EffectsonMortality(byCause).............51Table10:IVEstimates:PM2.5(Non-)EffectsonMortalityfromExternalCauses....52Table11:IVEstimates:PM2.5EffectonAll-CauseMortalitybyAgeGroup......53Table12:ReducedFormLeadandLaggedWPM2.5EffectonAll-CauseMortality54Table13:IVEstimates:EffectofPM2.5ExposureforFullGestationand16WeeksBeforeBirthonBirthOutcomes.........................55Table14:Henry'sLawConstantsandDryDepositionVelocitiesforGaseousPollutants57Table15:RegressionofOrganicGasesonWPM2.5................58Table16:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetI.......59Table17:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetII.......60Table18:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetIII......61Table19:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetIV......62Table20:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetVI......63viiTable21:HypotheticalTwo-SampleEstimatesforAngristandEvans(1998),Effectsof2orMoreChildrenonLaborSupplyforMarriedWomen,21-35.......112Table22:LinearPeerEffectComponentEstimates(MathandScience).........149Table23:LinearPeerEffectComponentEstimates(SocialStudiesandEnglish)....150Table24:LinearPeerEffectComponentPlaceboTest(AlgebraIIandBiology).....151viiiLISTOFFIGURESFigure1:NumberofAcresBurned(Thousands)forAllFiresGreaterthan1,000Acres,2000-2010....................................39Figure2:WAirPollutionModeling-BlueSkyFrameworkWw.......40Figure3:AverageRawWPM2.5OutputbyCounty,CONUS,2004-2010....41Figure4:Quantile-QuantilePlotsofPM2.5versusCounterfactuals...........42Figure5:SplineControlFunctionRegressionofAll-CauseMortalityonPM2.5,byDecile......................................43Figure6:PiecewiseRegressionCoefEstimatesofDailyStationPM2.5onRawandLog-transformedWPM2.5ModelOutput,byVigintile.......56Figure7:MeanSimulatedTS2SLSPointEstimatebyFirstStageSampleSizeN2...107Figure8:SimulatedTS2SLSStandardErrorbyFirstStageSampleSizeN2.......108Figure9:SimulatedTS2SLSStandardErrorbysecond-stageSampleSizeN1......109Figure10:MeanSimulatedTS2SLSPointEstimatebyProportionofOverlapBetweenSamples.....................................110Figure11:TS2SLSSimulatedStandardErrorbyProportionofOverlapBetweenSamples111Figure12:AlgebraII:EffectsofPeerAbilitySharesbyOwnRelativeAbilityinClassroom152Figure13:AlgebraII:InvertedPlotofEffectsofPeerAbilitySharesbyOwnRelativeAbilityinClassroom..............................153Figure14:AlgebraII:EnglishClassCompositionPlaceboTest..............154Figure15:AlgebraII:EnglishClassCompositionPlaceboTest(Inverted)........155Figure16:AlgebraII:ScienceClassCompositionPlaceboTest..............156Figure17:AlgebraII:SocialStudiesClassCompositionPlaceboTest..........157Figure18:PeerEffects:GeometrywithEnglishClassPlacebo..............158Figure19:PeerEffects:SciencewithEnglishClassPlacebo...............159Figure20:PeerEffects:EnglishwithSocialStudiesClassPlacebo............160Figure21:PeerEffects:U.S.HistorywithEnglishClassPlacebo.............161ixFigure22:PeerEffects:CivicswithEnglishClassPlacebo................162Figure23:PeerEffects:BiologywithEnglishClassPlacebo...............163xChapter1.AirQuality,PerinatalHealth,andMortality:CausalEvidencefromWes1IntroductionInthepast40years,ambientairqualityregulationhasgrowninresponsetotheburgeoningev-idenceofthepublichealthcostsofairpollution.TheCleanAirActAmendmentsof1970,theestablishmentoftheEnvironmentalProtectionAgency(EPA),andsubsequentofairqualitystandardshaveallcontributedtogeneraldownwardtrendsinpollutionlevels.Whilethehealthofairqualityimprovementareuncontroversialatthehighestmarginsofpollutantlevels,theimportantquestionremainswhetheradditionalreductionswillalsoyieldhealthbene-andwhetherthosehealthexceedthemarginalcostsofabatement.Becausepollutantsarenotrandomlyassignedandmaybecorrelatedwithotherdeterminantsofhealthoutcomes,animportantchallengehasbeentodevelopresearchdesignsthatprovideprecise,unbiased,andpopulation-representativeestimatesofairpollution'seffects.Inthispaper,Iexploitquasi-randomshockstoambientparticulatematter(PM2.5)concen-trationsgeneratedbylargeacrosstheUnitedStatestoestimateeffectsonmortalityandinfanthealthoutcomes.Wareuncontrolledprimarilyoccurringinremotewildernessareas,butcausevariationinurbanparticulatelevelsthroughmechanismsthatareplau-siblyunrelatedtonon-pollutiondeterminantsofhealth.First,Iquantifytheeffectthathaveonairqualitybyapplyingasequenceofspecializedemissionsanddispersionmodelstohis-toricaldatatogeneratemeasuresofpollutionforthecontinentalU.S.overtime.Then,Iestimatetheeffectsofshort-termandinuteroexposuresonadultmortalityandinfanthealthoutcomes,usingmodeledPM2.5asaninstrumentalvariableforstation-observedPM2.5.IuseextensivepollutionmonitoringdataŠspanning60PM2.5subspeciesand18criteriapollu-tantandorganicgasesŠtodecomposetheshocktoairqualityrepresentedbythePM2.5instrumentandpresentamethodologyforassessingpotentialbiasfromomittedpollutants.1TheU.S.EnvironmentalProtectionAgency(EPA)hassixairborneficriteriaflpol-lutantstoberegulatedundertheCleanAirActthataregenerallyconsideredharmfultopublichealth:particlepollution(PM2.5andPM10),carbonmonoxide(CO),nitrogendioxide(NO2),ozone(O3),sulfurdioxide(SO2),andlead(Pb).ThisstudyestimatestheeffectsofPM2.5gen-eratedbyandtestswhethertheseestimatespotentiallytheeffectsofothercriteriapollutants.Fineparticulatematter,asparticulatematterlessthan2.5micrometersindi-ameter(PM2.5),isconsideredthemostdangerousbecauseofitsabilitytopenetratedeepintothehumanlungandsometimesenterthebloodstream.ImakeseveralcontributionstotheliteratureonboththehealtheffectsofairpollutionandairqualityeffectsofFirst,Isystematicallyassesstheimpactoflarge(1000acres)onground-levelairqualityinthecontinentalUnitedStates.Icombineestimatesofsourceemissionswithanatmosphericmodeltoretrospectivelyforecastthespatialdistributionofrelatedpollutantconcentrationsintheperiodfollowingahistoricalevent,usingtheresultingdatatopredictpollutantconcentrationsatairpollutionmonitors.For78pollutants,Iestimateasetoflowerboundsforthepercentageofeachpollutant'saverageambientconcentrationthatisattributabletonotably,contributeatleast15%ofambientaggregatePM2.5,5%ofPM10,5%ofO3,andlargefractions(15%-35%)ofseveraldangerousmetalsboundtoparticulates,includingarsenic,lead,mercury,nickel,andcadmium.Inadditiontotheaggregatequantitiesofparticulatepollutantstheygenerate,cyclemetallicandotherhighlytoxicindustrialemissionspreviouslydepositedintowildlandvegetationandsoilsbackintotheatmo-sphere,resultinginnewground-levelexposuresinpopulationcenters.Theseunderscorethepotentialofcontributiontopublichealthandthatsociallyoptimalmanagementpoliciesmusttakehealthcostsfromworsenedairqualityintoaccount.Furthermore,75%ofgeographicexposuretoPM2.5occursoutsideofthestateofori-gin,raisingthepossibilitythatmanagementpolicyintheU.S.maybeinefduetointer-statespillovers.Next,IestimatetheeffectofaveragemonthlyPM2.5exposureoncounty-levelmortalityrates2for2004-2010viainstrumentalvariables,controllingforweathervariablesandstringentsetsofreedeffects.Short-termexposuretoPM2.5isassociatedwithmortalitywithmag-nitudesconsistentwithpriorliterature,andthedoseresponseisapproximatelylinearbelowreg-ulatorylimitsofPM2.5.Atransitory10µgm3increaseinacounty'saveragemonthlyPM2.5(approximatelyadoublingofaverageambientconcentrationsinthesampleperiod)isassociatedwithoneadditionaldeathper100,000individuals.Theseeffectsarelargelydrivenbycardiovas-cularandrespiratoryfatalities,butPM2.5isalsoassociatedwithgeneraldisease-relatedcausesofdeath.Nearlyallshort-termPM2.5-relateddeathsareofindividualsoverage65,andwomenaretwiceassusceptibleasmen.Becausealsoemitlargequantitiesofseveralgaseouspollutants,Iattempttocontrolforpotentiallycorrelatedpollutantsbyincludingasetofcompa-rablymodeledcontrolsforNO2,SO2,NH3,andVOCgases.BecauseofthecomplexchemicalandenvironmentalinteractionsunderlyingO3productionandthecorrespondingdifofpre-dictingO3concentrationsfromusingthesamesetofpollutionmodels,IamunabletodecisivelyruleoutconfoundingeffectsfromO3.Basedonbestavailableestimatesfromotherstudies,Iestimatethatthiseffectisontheorderof35%oftheestimatedeffectofPM2.5.Finally,IestimatetheeffectsofprenatalexposuretoPM2.5onprematurebirthrates,birthweight,andsexratios,smallbutstatisticallyharmfuleffects.Ialsomarginallycantevidencefornegativeeffectsonthefractionofmalesinabirthcohort,andmyestimatesareconsistentwithhighersusceptibilityofmalefetusestodeathfrompollutionshocksestimatedinSandersandStoecker(2011).Controllingfornon-PM2.5emissionsfromresultsinadifferentcompositionofPM2.5thatmoreheavilyfavorstoxicspecies,andIlargereffectsinthepresenceofhigherfractionsofmetalsandlowerfractionsofnon-metalparticulates.WhiletheintrusivequalityofPM2.5isthebasisfortheproposeddangersofPM2.5,thereiswideheterogeneityinthechemicalcompositionofPM2.5andsomeevidenceofheterogeneouseffects,butrelativelylittleunderstoodabouttherelativetoxicitiesofindividualsubstances(Bell2012).PM2.5iscomposedofawiderangeofsub-stances,includingelementalcarbon(EC),organiccarbon(OC),nitrates(NO3-),sulfates(SO42-),3andmetalsboundtoparticulates(suchasmercuryandlead).Someoftheseareformedorre-leaseddirectlyfromanemissionsource(commonlyEC,OC,andmetals)andothersareformedthroughchemicalreactionsintheatmosphere(e.g.,nitratesandsulfates).Compositionalsovarieswidelybyregionandovertimefromcross-sectionaldifferencesandseasonaldifferenceswithinregions(Franklinetal.2008).Thisheterogeneitypresentsproblemsfortheeffectiveregulationofparticulatelevels,assmallexposuresofhighlytoxicspeciesarepotentiallyasdangerousaslargeexposuresofEC,OC,orotherspeciesthataccountformostofPM2.5mass.Relatedly,itpresentsstatisticalchallengesforinterpetingestimatedeffectsforPM2.5.Whencontrolsfornon-PM2.5pollutantsfromareincluded,estimatedeffectsonmortalityincreasebyovertwotimes.Forinfanthealth,effectsapproximatelydoubleforprematurity,gestationalage,andaveragebirthweight.IinterpretthispatternofestimatesasevidencethattheconditionalmixtureofPM2.5iden-hasincreasedtoxicitythatexceedsanyreductioninupwardbiasaccomplishedbyaddingcontrols.Thisworkattemptstomakenewmethodologicalandevidentiarycontributionstothealready-largeanddiverseeconomicliteratureonthehealtheffectsofpollution.Forshort-termhealthoutcomes,panelstudiesandregionalnaturalexperimentstudiesaretwopopularresearchdesigns.Thewidely-acceptedtruismmotivatingmostofthecontemporaryairpollutionliteratureisthatpollutionexposureisnon-randomlyassignedandsystematicallyrelatedtootherdeterminantsofhealthoutcomes.Panelstudies,suchasCurrieandNeidell(2005),attempttoaddressthisnon-randomassignmentthroughexploitingnarrowvariationthroughstringentedeffects.Naturalexperimentstudiestrytoprovideasourceofquasi-randomassignmentbyisolatingthevariationtheyusetoaparticulartypeofpollution-generating(orreducing)event.StrategieshaveincludedexploitingthetimingoftheCleanAirActof1970topredictrelativelysuddendecreasesinpar-ticulateconcentrations(ChayandGreenstone2003);changesindailyairporttrafcongestioninCaliforniacausedbyweatherinothermajorairports(SchlenkerandWalker2011);weeklypanelvariationinautomobiletraftoidentifytheeffectsofcarbonmonoxide,ozone,andparticulatematteroninfantmortalityrates(Knittel,Miller,andSanders2011);andtemperatureinversions4inMexicoCity(Arceo-Gomezetal.2011).Currieetal.(2013)provideanextensivesurveyofbothtypesofpapersexploringtheeffectsofearly-lifeexposuretopollution,ageneralcon-sensusthatairbornepollutantsareassociatedwithinfantmortality,prematurebirth,andlowbirthweight.Papersapplyingnaturalexperimentstoadultmortalityhavebeenmoreinfrequent.Chayetal.(2003)usesthetimingoftheCleanAirActof1970,effectsonadultandelderlymortality.Popeetal.(2007)usean8-monthnationalstrikeofcoppersmelterworkerstoestimatetheeffectofsulfateparticulatereductions,a2.5%reductioninmortalityoverthestrikeperiod.Severalpapershaveattemptedtoestimatehealtheffectsofmajorevents,implicitlytakingtheirexposuremeasuresasproxiesforpollutionshocks.Jayachandran(2009)examinessharpincreasesinparticulatepollutionfromanintenseseasoninIndonesiain1997,track-ingspatialandtemporalvariationinpollutionfromusingsatellite-basedmeasuresofparticulatelevels.Sheevidencethatprenatalsmokeexposuresduringthatperiodcausedasubstantialincreaseinearly-lifemortality,ontheorderofa20percentincreaseintheunder-age-threemortalityrate.Breton,Park,andWu(2011)estimatethatprenatalexposuretohighPM2.5concentrationsfromaweek-longeventinCaliforniawasassociatedwithan18gdecreaseinmeaninfantbirthweightincomparisontocountiesunaffectedbytheFewstudieshaveusedmodeledexposuresfromlargeemissioneventsbasedonatmospherictransportmodels,andnonehaveusedexposuresintandemwithmonitoringdatatopredicthealthoutcomes.1Rappoldetal.(2012)usemodeledexposuresinNorthCarolinatoassessincreasesinasthmaandcongestiveheartfailureriskswithreduced-formPoissonregressions.Iagapintheliteraturebyincorporatingdevelopmentsinemissionsandatmospherictransportmodelingandtakingadvantageofsubstantialincreasesincomputationalpowermadeoverthelastdecade.Iunitequasi-randomvariationinpollutionlevelspredictedfromandat-1Aclassofstudydistinctfromthisonecombinesmodeledexposureswithpre-existingestimatesofhealthriskstodeterminepopulation-wideimpacts.Forexample,Caiazzoetal.(2013)usetheCommunityMultiscaleAirQuality(CMAQ)modelcombinedwiththeU.S.NationalEmissionsInventoryfor2005tocreateanannualpredictedmapofaveragepollutionconcentrations,andinterpretthisasameasureoflong-termpollutionexposure.Also,severalstudiesuseobservedchangesinparticulatemeasurementsandonlyemployfibackwardtrajectoryflcalculationstoindirectlyverifythatlargechangesareduetoaevent,suchasordustepisodes.5mosphericmodelswithobservedpollutionlevelsinapaneleconometricsframeworktoestimatehealtheffects,providingamethodologythatbridgessomeofthelong-standinggapsbetweentheatmosphericscience,epidemiology,andeconomicsliteraturesonairquality.2Data2.1ModeledWeAirPollutionCombininghistoricaleventdataandmeteorologywithsrelevantandatmo-spherictransportmodels,Igenerateahigh-resolution,griddeddailymeasureofpollutionforthecontinentalU.S.(CONUS)domain.ThemeasurerepresentsaretrospectiveforecastofwherepollutionfromdocumentedeventswouldbelikelytohavetraveledgivenwhatisknownaboutatmosphericbehaviorduringandaftertheTothisend,IusetheBlueSkyFrameworksoftwarepackage,whichintegratesseveralexistingmodelsofemissionsandtransportprocessesintoaprocess.2.1.1WeDataStateandfederalagenciesresponsibleforwimanagementkeeprecordsonthelocation,size,andtimingofevents.Fireeventslargerthan1,000acresaregatheredfromtheFirePro-tectionAgency(FPA)FireOccurrenceDatabase(FOD),aninteragencycollectionofeventreportsupdatedforaccuracyandcleanedforduplicatesusingmethodsdescribedinShort(2013).Theeventcharacteristicsdrawnfromthisdatabaseformodelingarethelatitudeandlongitudepointdataofthedateandtimethewasdetected,areaoftheburnedinacres,andthedateandtimeatwhichaagencydeclareditcontained.ForanavailablesubsetoffederalIdrawthedateandtimeatwhichaagencydeclaredtheextinguishedfromaU.S.GeologicalSurveydatabaseofreportedbythesixmajorfederalagenciestaskedwithmanagingIfanyofthevaluesexceptforthecontainmentorextinguishdatesaremissing,theisomitted.Wheretimeofextinguishmentdataaremissing,Iempiricallyestimatethetotalburntimeusing6aregressionmodelwithcategoricaldummiesfortheareaandthedurationfromstarttocon-tainmentaspredictors,adjustingforreunobservedeffectsandseasonaleffects;wherebothcontainmentandextinguishmentdatesaremissing,Iusethesamemodelwithoutcontainmenttimetopredictburnduration.Themethodology,rationale,andresultsofthetotalburntimeesti-mationprocedurearedescribedinAppendixA.1.Figure1showsthetotalnumberofacresburnedinlargerthan1,000acresmappedbystatefor2000-2010,ascalculatedfromtheFPAFODdatabase.ThemajorityofareaburnedisconcentratedintheWest,Northwest,andSouthwesternUnitedstates,withadecreasingeastwardandstronglydecreasingNortheastwardpattern.Theselargeconstituteover85percentofareaburnedintheUnitedStatesoverthisperiod.Firessmallerthan1,000acresarealargerpercentageofareaburnedforNortheasternandCentralstates,butarenotincludedbecauseofhighcomputationalcostsofthemodelingprocessrelativetotheirsmalltotalemissionscontributionscomparedtolarger2.1.2DescriptionofWeEmissionsandAirPollutionModelingWcanbestartedbylightningstrikesordirectsunlightwhenhighlyfuels(e.g.,forestunderbrush)endureanextendeddryperiod.Warealsocausedbyhumanerrors,suchasescapedcaraccidents,ordownedpowerlines.Occasionally,theyareintentionallysetbyarsonists.Firesarealsointentionallysetbymanagementagenciestopreemptivelyburnfuelsfornaturally-occurringamongotherfunctions.Wincidencepeaksinmid-to-latesummer,buthasvaryingseasonalpeaksbyregion.Themajorityoflargeevents(over1,000acresinsize)occurintheWesternandNorthwesternUnitedStates.Thereareseveralphenomenawhichcontributevariationtotheamountofgeneratedpollutionatagivenpointintimeinspace.Broadly,thesearethecharacteristicsoftheandthemeteorologicalconditionsatthetimeofandshortlyaftertheevent.Thedurationoftheisafunctionoftimetilldetection,containmentefforts,andthecontainmentdifcultyoftheBesidesitsroleinpromotingtherateofspreadandultimatesizeofathefuelcoverdeterminesthevolumeandchemicalcompositionofemissionsfromtheperunitofareaburned.W7dominantemissionsbymassarePM10,PM2.5,CO,andNOx.InadditiontoPM2.5generatedbybiomassburning,suchasOrganicCarbons(OC),releasemineralsandmetalswhichac-cumulateinforestsoilsandvegetationfromatmosphericdeposition.Nearbyhistoricalindustrialactivityisstronglyrelatedtotheamountofleadandmercuryre-releasedbyintotheatmo-sphere,withthesere-emissionsrepresentingafractionofatmosphericconcentrations.Oncegenerated,emissionstravelupwardatvaryingspeedsdependingonavarietyoffactors,resultinginaheterogeneousverticaldistributionofpollutantsinaThisverticaldistributiontheninteractswithambientpressureandwindconditionswhichresultinairbornetransportofemis-sionsdownwind.Emittedparticles(andgases)interactwithweatherconditionsheterogeneously,resultinginrelativedownwindchangesinconcentrationsthatvarybypollutant.Drydepositionisasetofprocessesbywhichpollutantconcentrationsdecreasethroughcontactwithsurfaces,whichincludegravitationalsettlingandinterception(collisionwithtrees,buildings,etc.).Wetdepositionisasetofprocessesbywhichatmospherichydrometeors(e.g.,precipitation)absorbparticles.IutilizeasequenceofmodelsthatexploitseveralfacetsoftheseemissionsandpollutiontransportprocessestopredictthecontributiontoPM2.5levelsfromThecomputationalwwisexplicitlydescribedinAppendixSectionA.2.1;Figure2depictsthewwvisually.HistoricaleventsareinputintotheBlueSkyFramework,wherefuelload-ings,fuelconsumption,emissions,andverticalplumeriseareestimated;thesearefedasemissionsourcesintoHYSPLIT,whichcalculatestheconcentations'trajectoryanddispersionfromeachsource;hourlyspatialconcentrationestimatesarecalculatedatanapproximately1600km2reso-lution(approximately5,000uniquepointsinthecontinentalUS);then,theHYSPLITpredictedconcentrationsaresampledatpollutionmonitoringstationlocationsandaveragedbycountyandmonthtocreateamonthlypanelofcountyaveragesofpollution.2.1.3ModelingToolsTheinterfacebetweenmanagementandairqualitystandardshaspromptedextensivede-velopmentoftoolsinthelasttwodecadestoappraisethedownwindimpactsofBegin-8ningin2003,theNationalOceanicandAtmosphericAdministration(NOAA)developedandim-plementedtheSmokeForecastingSystem(SFS)toprovideoperationalforecastsofPM2.5(Rolphetal.2008).AcentraltoolintheNOAASFSistheBlueSkyFramework(BSF),amodel-ingframeworkwhichconnectsindependentlydevelopedmodelsoffuelloading,consumption,emissions,andatmospherictransport(Larkinetal.2009).TheBSFhasalsobeenusedindevelopmentofregionalforecastingsystemsinthePNorthwest(O'Neilletal.2009).TheBSFreadilyaccommodatesseveralpopularmodelsofeachcomponentofthemodelingprocess.TheFuelCharacteristicSystem(FCCS)isa1km-resolutionspatialmapoffu-elbedtypesacrossthecontinentalUnitedStatesdevelopedfromacombinationoffuelphotose-ries,scieliterature,satelliteimagery,andexpertopinions(Ottmaretal.2007).CONSUME3.0predictshowtheamountoffuelconsumptionforagiveneventdividesbetweensmouldering,andresidualphases,eachofwhichhaveuniquecontributionstoemissionsduetodifferencesincombustionefy(Prichardetal.2005).TheFireEmissionsProductionSimu-lator(FEPS)isasoftwaremodulethatsimulatesemissionproductionandplumebuoyancybasedonaprovidedconsumption(Andersonetal.2004).FEPSiscapableoffuelconsumptioncalculations,butthisfunctionalityisreplacedbyCONSUME3.0inthismodelingprocess.Thesethreemoduleshaveallbeenused,viatheBSF,inthedevelopmentofnationalemissionsin-ventoriessince2008.Lastly,theHybridSingle-ParticleLagrangianIntegratedTrajectorymodel(HYSPLIT)isasystemwhichusesgriddedmeteorologicaldatatosimulateairmasstrajectories,dispersionofconcentrationsfrompollutantplumes,anddepositionprocesses(DraxlerandHess1997).InadditiontobeingusedintheNOAASFS,HYPSLIThasbeenusedinhundredsofappli-cations,suchasmodelingfalloutdispersionfromtheFukushimaDaichiinucleardisaster(Draxleretal.2013),AfricandusttransporttotheIberianpeninsula(Escuderoetal.2006),anddispersionofparticulateheavymetalsfromindustrialemissionsourcesinSpain(Chenetal.2013).92.2BirthandMortalityDataDataonthepopulationofbirths,linkedinfantdeaths,andmortalityeventsintheUnitedStatesfor2004-2010comefromtheU.S.CenterforDiseaseControl's(CDC)NationalCenterforHealthStatistics'(NCHS)NationalVitalStatisticsSystem(NVSS).Datasetscontainallnon-identifyinginformationrecordedonbirthanddeathEachbirthrecordcontainstheyearandmonthofthebirtheventinadditiontoimportantperinatalhealthoutcomes,suchasbirthweight,Apgarscores,estimatedgestation,birthcomplications,andcharacteristicsofthemotherandfatherofthechild.Table3summarizestheseoutcomesbygestationalcategory(full-termandpre-term).Themortalitydatacontainindividualdeathrecords,whichincludetheyearandmonth,county,causeofdeath,andcharacteristicsofthedeceasedindividual(race,gender,andeducation).For2005andbeyond,countyarecensoredforallcountieswithfewerthan100,000individuals.Causesofdeatharecodedinto39groups,inaccordancewiththelatestoftheInternationalStatisticalofDiseasesandRelatedHealthProblems(ICD-10).County-by-monthmortalityratesforeachcausearecalculatedbysummingcountsfromthe34categoriescausesofdeath,includingcancers,heartfailure,respiratorydisease,andotherdiseasesanddivid-ingbyapopulationmeasure.Thepopulationestimatesusedtocalculateratesper100,000indi-vidualsarefromtheCDCNCHSBridged-RacePopulationEstimates,asetofannualintercensalcountypopulationestimateswithbreakdownsbysex,age,andrace.Igenerateanfiall-causeflratefromallnon-external,non-accidentalcausesofdeathforthegeneralpopulation,andbygender,andinfant,child,and10-yearagegroups.Table2reportssummarystatisticsformortalityratesinthesample.2.3AmbientAirPollutionandWeatherDataDailyaveragemonitoringstationobservationsofpollutantlevelsaregatheredfromtheU.S.En-vironmentalProtectionAgency'sAirQualitySystem(AQS),acentralizeddatabaseofpollutantmeasurementsfromstateandfederalmonitors.Thegeographicandtemporaldistributionofmea-10surementsvarieswidelybypollutant.ThePM2.5ChemicalSpeciationNetworkprovidesmeasure-mentsofPM2.5subspeciesofinterest,suchasmetalsandnitrates.Somestationscollectdataatweeklyratherthandailyfrequency.Forcounty-monthswithmissingstation-days,Iusetheaverageofnonmissingobservationsbyaveragingtomonthlystationobservations,andthenaveragingstation-monthvaluestocounty-monthvalues.County-monthswithnostationobservationsareex-cludedfromthesample.Forbirthanddeathoutcomes,Ithemother'sanddecedent'scountyofresidences,respectively,astheaggregategeographicunitsforcalculatingpollutionexposure.Forlocalweathermeasures,IusedatafromtheNorthAmericaLandDataAssimilationSystemonaveragemonthlydailymaximumandminimumairtemperaturesandmonthlyprecipitationquan-titiesforeachU.S.county.ThesedataweredrawnfromtheCDCWONDERdatabase.Thisdatasourceisdistinctfromthemeteorologicalreanalysisdatausedasinputsintothepollutiontransportmodel.3EconometricApproach3.1StatisticalModelIconsiderthefollowinglinearmodelofhealthoutcomeyitwithaK1vectorofendogenousvariablesrepresentingpollutionlevels,Pit,andasetofunobservedeffects:yit=Pitb+Rity+ai+git(t)+eit(1)Pkit=zitgk+Rityfk+hki+fki(t)+vkit(2)git(t)=ci;a(t)+si;m(t)+tiw(t)(3)fki(t)=cfki;a(t)+sfki;m(t)+tfkiw(t)(4)11Equation(1)showstherelationshipbetweenthehealthoutcome(e.g.,meanbirthweight)yitandpollutantsPitforcountyiinmontht.Ritisasetoftime-varyingcountycharacteristics,airepresentsacountyedeffect,andgi(t)generallyrepresentstime-varyingunobservedhetero-geneity.eitisanidiosyncraticerrortermwhichmaygenerallybecorrelatedwithPit.Equation(2)representsthestagerelationshipbetweenpollutantkandthevectorofatleastKexcludedmodeledpollutioninstruments,zit,withasetofedeffectshkiandfki(t)matchingthoseinequation(1).Equation(3)gi(t)asthesumofsetsofregion-yearedeffectsci;a(t),region-month(seasonal)effectssi;m(t),andarbitraryregionaltimetrendstiw(t).a(t)andm(t)arefunctionswhichconverttheglobaltimeindexttothecorrectcalendaryear(e.g.,2004)andcalendarmonth(e.g.,July)indices.fiRegionflcangenerallyrefertoanygeographicunitwhichhierarchicallynestscounties,includingcounties,states,andNCDCclimateregions.Equation(4)fki(t)inparallelto(3)forPkit(exceptnaturallyrequiringthatedeffectsvarybypollutantk).Region-yearedeffectsaccountforannualtrendsinthehealthoutcomeincludingthosedrivenbychangesinpollutionfromsourcesotherthanThisincludesreclimatologicalchangesandregulatoryresponsestoincidenceorpollution,whichmightsimultaneouslyaffectbothincidenceandhealthoutcomes.Region-monthedeffectsaccountforunobservedpersistentseasonaldifferencesbetweenregions,suchasweatherpatternsthatdriveseasonalityinincidenceandhealthoutcomes.Includingedeffectsincreasestheplausibilityoftheassumptionthattheinstrumentisexogenousinequation1;namely,thatE[eitjzit;Rit;ai;git(t)]=0.3.2ThestructuralmodelofatmospherictransportrepresentedbyHYPSLITseamlesslycombinesemissioninputs,trajectoryanddispersioncalculations,andpollutantremovalfromtheatmospherethroughdepositionprocessestoformasingle,powerfulinstrumentintheformofapredictedconcentration.Thedominantsourceofvariationinsimulatedpollutionconcentrationsusingthe12HYSPLIT-basedmodelingframeworkisthecommonmovementofairparcels(i.e.,wind).How-ever,fuelloadings,wetdeposition,anddrydepositiongeneratesomeindependentvariationamongpollutanttypesthatcanseparatelyidentifytheireffects.ThepossibilityofseparateofpollutantsbreaksdownasthepollutantsbecomemoresimilarinthewaysthatHYSPLITisabletodistinguishthem;modeledconcentrationsofsimilarpollutantsarehighlycollinear.Acorol-laryofthisisthatevenaperfectlycalibratedpollutantinstrumentwillalsoproxyfortheeffectsofitsunmodeledclosechemicalneighbors,potentiallycausingbiasinestimatesoftheeffectofapollutantspecies.Inthemodelingframeworkusedhere,variationindownwindwild-PM2.5independentfromotherpollutantsisprimarilybydifferencesinfuelcompositionattheanddepositionratesbetweenPM2.5andgases.Interpretationoftheestimatedeffectsiscomplicatedbyheterogeneouseffects,especiallythosedrivenbythechemicalcompositionofthePM2.5thatisstatisticallythiscomplicationisexaminedinSection4.1.2.TheseproblemsholdtruefornearlyanyattempttoidentifytheeffectsofPM2.5.Previousstudieshavesimilaryexploitedatmosphericphenomenaandpollutantcharacteristicsthroughregressioninteractions.Forexample,SchlenkerandWalker(2011)interactairporttaxitimewithwindspeedtoseparatelyidentifyCOandNO2,whichmaybeexplainedbydifferingdrydepositionratesbetweenCOandNO2.NO2hasahigherdepositionvelocitythanCO.Assum-ingaedemissionratioofCOtoNO2,higherwindspeedswillcarryparcelsofbothpollutantsequallyfarbutdepositmoreNO2thanCO,resultinginanincreasingratioofCOtoNO2indis-tancefromtheairport.AnalternativeexplanationtheysupplyisthathigherwindspeedschangethecompositionofemissionsfromairplaneenginestobemoreNO2-heavy.Bothdepositiondif-ferencesacrosspollutantsanddifferencesinemissionsratiosforeventswouldbecapturedbyHYSPLIT'sdepositionmodelingprocess,withthepracticaldrawbackthatonemustbeaboutdepositioncharacteristicsandemissionsquantitiesinHYSPLIT'ssetup.133.3TestingandControllingforEffectsfromMultiplePollutants3.3.1ControllingforMultipleWePollutantsIntheidealempiricalsetting,onewouldhavealargeenoughdatasetwithmeasurementsofallspeciesofinterestwithidentifyinginstrumentsforeachspeciesandestimatetheeffectsofmultipleendogenousvariablesusing2SLSorotherwiseappropriateIVestimator.Inreality,stationcover-ageislimitedtofewerthan20%ofcounty-monthobservationsinthesampleperiod,andfurtherlimitedwhenoverlappingspeciesmeasurementsarerequired.Thegenerationofstrongidentifyinginstrumentsmaybebothconstrainedbythequalityofmodelsandpracticallycon-strainedbycomputationalpower.Inlieuoftheidealestimationofallpollutants'coefitisfeasibletoconsistentlyestimateasinglestructuralparameterofinterest(inthiscasePM2.5'seffect)withoutanyconcernforthestructuralparametervaluesforotherpollutants.Generally,evidenceforaconsistentestimateoftheeffectofPM2.5canbeestablishedbyexhaustingpoten-tialconfoundingcausalpathwaysthroughacombinationofcontrolvariablesandpre-testingforomittedvariables.2Undertheassumptionthatestimatesusingthepollutioninstrumentwilleffectscausallyoriginatingwitheventsonly,theprimaryriskofconfoundingcomesfromomit-tedpollutantswhicharecorrelatedwiththeinstrument.AmeasureofdownwindPM2.5fromawillbecorrelatedwithotherpollutantsemittedconcurrentlyinthesamescom-bustionprocesses,whichwillalsoshareatleastsomeofitsatmospherictrajectory.Forexample,simultaneouslyemitquantitiesofPM2.5andNO2,andtheiratmosphericdestinationsarehighlycorrelated.Inthisframework,thehealth-effectparameterforPM2.5,bPM25,canbeiden-eitherthroughjointIVestimationofallpollutants,orthroughsingle-variableIVestimationofPM2.5alonewithcontrolsforpollutantsfromthesamesource.Thisequivalenceismotivatedbywritingthereducedformforequation1asfollows,onlysubstitutingtheendogenousvariablerepresentingPM2.5usingthestagebasedonasingleinstrumentforPM2.52Causalpathwayscanalsobecrediblyruledoutusingevidencefromrigorousstudiesthatnoeffectsofanomittedexplanatorontheoutcomeofinterest,butIdonotdothishere.14yit=zpm25;ithpm25+PB;ithB+¡k;it+eit(5)zk;itisaspollutantkoriginatingfromPB;itisthevectorofallpollutantsexcludingPM2.5originatingfromallsources.Forbrevity,¡k;itasthecompositesetofcontrolsandeffectsandeitasacompositeerrortermforequation1.Partitioneachpollutantkintoitsconcentrationfromanditsconcentrationfromallothersources,PB;it=zB;it+ŸPB;it.Then,yit=zpm25;ithpm25+zB;ithwfB+¡k;it+ŸPB;itŸhB+eit(6)BecausePM2.5inpartsharescommonemissionandtransportprocesseswithotherpollutantsfromPM2.5andotherpollutionarecorrelated:E[zB;itjzpm25;it;¡k;it]6=0.Uncontroversially,E[ŸPB;itjzpm25;it;¡k;it]=0isacoreassumptionforthevalidityoftheinstrument;PM2.5mustbeorthogonaltoanypollutantsinBfromallsources.Thereduced-formregressionofyonzpm25willbeinconsistentforhpm25.However,zB;itisobservedbyvirtueofthesamemodelingprocessthatgenerateszpm25,andthereduced-formregressionofyonzpm25andzBproducesaconsistentestimateforhpm25.Correspondingly,providedtheotherkeyassumptionsfortheconsistencyofIVaremet,IVestimationofyonPpm25andzBwith(zpm25;zB)asinstrumentsisconsistentforbpm25.WhileboththejointIVestimationandthesingle-variableIVprocedureswillbeconsistentforbPM25,single-variableIVisfarmorefeasibletoimplement;itonlyrequiresstationobservationsofPM2.5,aninstrumentforPM2.5,andadequatecontrolsforcorrelatedpollutants.Insomecases,modeledpollutantsmaybesufascontrolsbutnotsufasidentifyinginstruments;jointIVestimationofPM2.5andNO2withastronginstrumentforPM2.5andweakinstrumentforNO2mayresultinaninferiorestimateforPM2.5comparedtothecorrespondingconsistentone-variableIVestimateforPM2.5.Analternativesolutiontocreatinganinstrumentorproxyistousetheendogenousmeasureoftheomittedvariableasacontrol,butinthepollutionsettingthisisnotalwaysfeasible.First,measurementcoverageforeachpollutantspeciesincompletelyoverlapsbothacrossstationsand15time.Second,whilethequalityofstationmeasurementsforaparticularspeciesmightbesuffordeterminingwhetherhaveanimpactonconcentrationsofthatspeciesinastation-by-stationanalysis,theymaynotbeappropriatemeasuresofconcentrationsforaggregategeographicregionsusedtomeasurehealthoutcomes(i.e.,countiesinthispaper).Relatedly,totheextentthatstationmeasurements(whetherduetodirectstationmismeasurementorspatialerror)failtocapturevariationfromduetomeasurementerror,thecontrolwouldfailtoaccountfortheoftheomittedvariable.Theestimatesinthispaperinsteadusetheequivalentofaproxyforpollutantsemittedfromascontrols,therebyreducingorremovingtheirconfoundingrole.3.3.2Pre-testingforOmittedPollutantsItispossibletomeaningfullypre-testforpotentialomittedvariablesprovidedthereareobserva-tionscontainingvaluesofboththeomittedvariableandtheinstrument.Sufpowerinthetestobviatestheneedtodevelopinstrumentsorcontrolsfortheomittedvariableifthetestisnegative.Thetestistorunatageregressionofthesuspectedomittedvariableonthecurrentsetofinstrumentsandcontrolsandcheckingwhetherthecurrentinstrumentsarejointlypredictorsoftheproposedomittedvariable.Inpractice,aresearchermaynotbeabletodevelopanadequateidentifyinginstrumentorproxyforthepotentialomittedvariable,andshemightnotbeabletodirectlycontrolformeasuresoftheomittedvariablewithoutlosingsamplesize(orrelyingonimputationmethods).Thecreationofnewinstrumentsorproxiesfornewpollutantspeciesisconstrainedpracticallybycomputationalrequirementsanddevelopmenttimeforaccurateemissionfactorsanddepositionparameters.Separateofpollutantsisalsostatisticallylimitedbythemechanicalrichnessofthemodelingprocess.Asseparateofpollutantsinthemodelingprocessusedhereisdrivenbydifferentialemissionsanddepositionbehavior,pollutantswithverysimilaremissionanddepositionpropertieswillbeweaklyunlesssomepartofthemodelingprocessisupgradedtoexploitotherdifferencesincharacteristicsnotaccountedforbyHYSPLIT(e.g.,buoyancy,aerodynamic,orphotochemicalproperties).16Toillustrate,considerasimpletwo-pollutantexamplewithPollutantAandPollutantBandaedmeasureofPollutantAasanidentifyinginstrument.AssumewehaveapriorbeliefthatPollutantBcausesmortality.IfPollutantBispositivelycorrelatedwithgeneratedPollutantA,thenaninstrumentalvariablesregressionofmortalityonPollutantAwithPollutantAasaninstrumentandnocontrolforPollutantBwillbebiasedup-wardduetotheconfoundingeffectofPollutantB.Hence,aregressionofPol-lutantBobservationsonPollutantAwhichproducesapositivecoefonPollutantAisinterpretedasevidenceofthisupwardbias(incontextofthepriorbeliefthatPollutantBhasaneffectonmortality).Thistestforomittedvariablesholdsunderoneadditionalassumption:thedirection,butnotnecessarilythemagnitude,oftheaveragepartialeffectoftheinstrumentisthesamebetweenthesamplesusedfortestingandestimation.Iftheinstrumentismonotonicallyrelatedtotheendogenousvariableofinterestforthepopulation,thisassumptionisFortherelationshipbetweenneratedpollutiontoobservedpollution,theseassumptionsarelikelytohold.Whiletheremaybet-stageheterogeneouseffectsofthemodeledpollution(eitherduetoheterogeneousmodelingerrororbecauseoftrueheterogeneityintheworldduetochemistryorotherprocesses),Iassumethateffectsareboundedbyzero.Withtheexceptionofafewhighlyreactivepollutantsand/orpollutantswithlowatmosphericquantities,pollutioncanbegenerallyexpectedtohomogeneouslyweaklyincrease(ordecrease)eachpollutanttypeacrossgeographiclocationandtime.LetsuperscriptAandsuperscriptBdenotethatthevariableisdrawnfromestimationsample'sandtestingsample'ssubpopulations,respectively.Theassumptioncanbewrittenas¶E(PAitjzAit;RAit;hAki;fAki(t))¶zAit>0()¶E(PBitjzBit;RBit;hBki;fBki(t))¶zBit>0:(7)Inthelinearcase,thissimplytranslatestothecoefhavingthesamedirectionineachsample(i.e.,gAk>0()gBk>0).ThecorrespondinghypothesistestisofH0:gAk=gBk=0,HA:gAk6=0usingthet-testofH0:gBk=0fromtheregressionusingthetestingsample17B.Becauseofsamplingerror,failuretorejectthenulldoesnotruleoutomittedpollutants,buttheestimate'snceintervalcanbeinformativeofthelargesteffectthatisstatisticallysupportedbythegivenestimate.Thetruecoefinthetestingsamplecouldbesubstantivelysmallerthanthecoefintheestimationsample,inwhichcasetheintervalboundmaybemisleadinglylow.Amorestringentassumption,whichwouldimply(7),issimilartothenecessaryassumptionfortheconsistencyoftwo-sampleIVestimators:gAk=gBk.Thisassumptionpermitsamoreliteralinterpretationofthecoefandintervalswhentheomittedvariablestestisconductedwithasetofobservationsthatisnotidenticaltothatbeingusedtoestimatetheequationofinterest.IperformandinterpretthistestforcriteriaandorganicgasesinSection4.1.1.4Results4.1Wes'EffectonAmbientAirQuality4.1.1FirstStage:Wes'EffectonAmbientConcentrationsofPollutantsWhaveaconsiderableimpactonurbanairquality,andnoticeablyanddangerouslysoforlargerclosetourbancenters.ThePM2.5instrumentisastrongpredictorofPM2.5,butalsocapturessomeoftherelationshipbetweenandothercriteriapollutants.Foreachpollutant,Iregressthecounty-monthlyaverageofitsstationvaluesonthecounty-monthlyaverageofthePM2.5instrument(sampledatthestationsites),andIcontrolforcounty,state-year,andstate-monthedeffects,andquadraticsofaverageminimumtemperature,max-imumtemperature,andprecipitation.Imeasuretheaveragecontributionbyforeachpollutant'sconcentrationintheestimationsamplebycalculatingitspartialvaluezit‹gB,andcalculatethepercentageofallconcentrationsofthatpollutantattributabletotheinstrumentbydividingbytheaveragemeasuredconcentration.ThesepercentagescanbeinterpretedaslowerboundsoftheamountofeachpollutantattributabletointheCONUS.Irepeatthispro-cedurecontrollingforestimatesofNO2,SO2,NH3,andorganic(VOC)gasesfromandassesshowaunitincreaseintheinstrumentpredictsdownwindconcentrationsofcriteria18gases,organicgases,andPM2.5subspecies.InanotherIcontrolforonlyNO2andSO2.PanelAinTables4,6,and5showstheestimatedregressioncoefandpercentageofav-erageambientconcentrationscontributedbyforcriteriapollutants,non-metallicPM2.5,andveofthemosttoxicPM2.5species.AppendixTables16through20repeatthisexerciseforallothermetallicPM2.5species.Undertheassumptionthattheestimatedcoefpurelycausalrelationships,themaximumofthepercentageofambientconcentrationacrossdif-ferentcontrolpollutantcanbeinterpretedasanestimatedlowerboundonthetruepercentageofambientconcentrationscausedbyAssumingthestationsetsarerepresen-tativeoftheU.S.,theinstrumentpredictsnearly15%ofPM2.5levelsand5%ofPM10levels.Controllingfornon-PM2.5speciesaltersthedistributionofpollutantspredictedbytheinstrument,whichhasimplicationsforhealtheffectsestimates.PanelBofthepollutionregres-siontablesreporttheestimatedcoeffortheregressionwithcontrolsforotherpollutants.Theinstrumentceasestobeastatistically(andchemically)predictorofPM10,whilestillpredicting15%ofPM2.5mass.InterpretinghypothesistestsfortheseestimatesastheomittedvariablestestdescribedinSec-tion3.3forIVestimateswithnocontrolsforotherpollutants,weexpecttheeffectbythePM2.5instrumenttobebiasedupwardbyanyhealtheffectsofnon-PM2.5pollutantsthatareassociatedwiththePM2.5instrument.Hence,PM10andtwocriteriagases,O3andNO2,arepossibleconfounders,thoughthecontributionspredictedbytheinstrumentforthesepollutantsareonly4.7%,3%,and5.7%ofambientlevels.Organicgasesarepredicted,bothstatisticallyandinmagnitude.Becauseofsamplingerror,thistestdoesnotruleoutthatotherpollutantswithstatisticallycoefmaystillconfoundestimates,especiallyiftheir95%intervalupperboundisaquantitythatcouldhavemeaningfulhealtheffects.Forexample,benzeneispredictedat6%oftotalconcentrations,butitsintervalupperboundis16.2%ofbenzene,whichisarguablyaquantitythatcouldhaveamarginalhealthimpact.Benzeneconcentrationsmayonlybepoorlydetectedstatistically;19short-livedorganicgases,suchasm-xyleneandtoluene(8to48hours,Prinnetal.1987)showneitherstatisticallynorsubstantivelyeffects,whilebenzenehasacomparativelylongatmosphericlifetime(2weeksto2months).WhavetheuniquepropertyofinducingchangesinPM2.5almostuniformlyacrossbothhighlyandlightlypollutedareas.Thispropertyisfavorabletoestimatingpopulation-representativeeffects,sinceanarea'snon-wpollutionlevelsdrivesnonlineardoseresponseandmightalsobecorrelatedwitheffectheterogeneityduetootherfactors(e.g.,highly-pollutedareasalsohavelow-incomeindividualswhoaremorevulnerabletopollutionshocks).Figure4ashowsaquantile-quantileplotofallPM2.5againsttheestimatedimpliedcounterfactualPM2.5(aworldwithnoPM2.5),witheachpointrepresentingthenumericalvaluesatwhichthesamequantileoccursineachdistribution.Thequantilerelationshipsareapproximatelyparalleltothelineofdistributionalequivalenceandshiftedupward,suggestingthatPM2.5largelypreservestheshapeofthedistributionofPM2.5andonlyshiftsthemean.Forcomparison,Figure4bshowsacomparablequantile-quantileplotwhenthecounterfactualisestimatedusingthesamesetofedeffectsandstationobservations(i.e.,apurepaneldataapproach)insteadofedeffects-IV,revealingaconsiderablydifferentdistributionofmarginsofchangeforPM2.5drivenmostlybyleft-andright-tailbehavior.4.1.2PM2.5ChemicalCompositionbyWeInstrumentThetypesandquantitiesofPM2.5predictedbytheinstrumentlychangewhennon-PM2.5controlsareincluded.Section4.2.2outlinesanargumentforhowthischangestheinter-pretationofhealtheffectsestimatesbecauseofchangesintheleveloftoxicityperunitPM2.5.WhilethetotalmassofPM2.5predictedbytheinstrumentonlydecreasesby10%,thefractionsofsubspeciesgroupschange.Inthenon-metalliccategory,OrganicCarbonsdecreaseinconcentrationby60-75%perunitPM2.5,ElementalCarbonsby40-50%,andhydrogenPM2.5by70-85%.BrominePM2.5increasesby100%,andnitratesby50%,whiletheofsulfatesstaysapproximatelythesame.SeveralmetallicPM2.5speciesbecomemorestrongly20representedperunitofPM2.5byatleast50%:Arsenic,Lead,Nickel,Mercury,Cad-mium,Barium,Cesium,Cobalt,Gallium,Lanthanum,Selenium,Niobium,andRubidium.TheestimatedfractionofatmosphericmercuryPM2.5attributabletobecomesapproximately30percent,paralleltothefractionestablishedinaninventoryofmercuryemissionsintheU.S.(WiedenmyerandFriedli2007).Predictedarsenicincreasesbyafactorofnearly30,nowaccountingfor19percentofambientarsenicconcentrations.Lead,Nickel,andCadmiumarealsoallenhancedperunitPM2.5.ThespeciatedPM2.5datapresentafairlycompletepictureofPM2.5intheU.S.Over81%ofaveragePM2.5concentrationisaccountedforbythesubspeciesImodel.Theremainingun-explainedPM2.5concentrationmaybeduetoknownPM2.5specieswhichImeasureimperfectlyornotatall(suchasseasaltanddust)anddifferencesinmeanconcentrationsbetweenthePM2.5SpeciationnetworkandgeneralPM2.5stationsamples.Moreover,heterogeneouscoefbetweentestingandestimationsamplesarenotlikelytodrivemostoftheresults.ThenumberofobservationsmeasuringtotalPM2.5exceedsthenumberofobservationsmeasuringindividualspeciesby20,000to30,000,drivenmostlybyspatialvariationinstationcoverage.Despitethedisparityinspatialsampling,theinstrument'sestimatedeffectontotalPM2.5concentrationiscloselymatchedbythesumofcoefforindividualPM2.5species(intheno-controlscase,alessthan1%difference).Thissuggeststhatanybetween-sampledifferencesintherelationshipbetweentheinstrumentandpollutantsaremean-zeroacrossPM2.5subspecies.Somecoefformetallicspeciesarenegative.Thecausalinterpretationfornegativeco-efisthatsomethinginthepollutantplumecausesachemicalreactionthatremovesquan-titiesofanotherspeciesoritsprecursors(e.g.,throughoxidationorbinding).ManymetalPM2.5species,includingmercury,areasthemetalboundtootherairborneparticles,suchasblackcarbon(soot).Chemicalreactionswithemissionsmaychangesuchmetalsbacktotheirgaseousphases,oradditionalsubstancesmaybindtoandchangetheparticletoalargersizeclass.Anotherpossibilityisthattherelationshipisnotcausal.ThePM2.5instrumentisgeneratedusingasetofemissionsfactorsforallPM2.5.Ifthereisgeographicheterogeneityofsubspeciesemissions21(e.g.,aluminum,silicon,andothermetals)thatisnegativelycorrelatedwiththetotalamountofPM2.5emitted,highdownwindPM2.5valueswillalsobenegativelycorrelatedwiththosemetals.Thepossibilityisthatstations'measurementmethodsmayhavesomesystematicmeasure-menterrorforsubspeciesmeasurementsthatvarieswiththeamountofothersubstancesintheair.4.2Short-termEffectsonMortality4.2.1Short-termEffectsofPM2.5onAll-CauseMortalityPanelAofTable8reportsthe2SLSestimatesoftheeffectofaveragemonthlyPM2.5onmonthlyall-causemortalityratesusingPM2.5asaninstrument,eachcolumnreportingacationwithadifferentsetofedeffects.Estimatesrangefrom0.67to1.05additionaldeathsper100,000peoplepermonthly10µgm3increaseinPM2.5.PanelDreportsOLSestimateswiththesameedeffectsandweathercontrolsasthe2SLSestimates;theyareinsignandsharplyestimatedclosetozero,theimportantroleofexposuremeasurementerrorandomittedvariablescausingdownwardbias.Estimatedeffectsusing2SLSincreasewiththeinclu-sionofmorestringentreedeffects,providingsomeevidenceofreconfounderstoPM2.5suchasunobservedseasonalweatherfactorsorendogenousan-nualpolicyresponsestopoorairqualityorhighactivity.Itisalsopartiallyexplainablebychangesinthebiasofthe2SLSestimatoracrosstionsbecauseofrel-ativechangesintheratioofendogeneityinPM2.5tothestrengthofthegerelationship(seeAppendixA.5);however,intervalsbasedontheinvertedAnderson-Rubinteststatistic(AndersonandRubin1949;FinlayandMagnusson2009)areveryclosetotheconven-tionalasymptoticintervals,whichisevidenceagainstanymeaningfulbiasfromweakinstruments.Finally,thesechangescanbeattributabletochangesinthePM2.5compositioniden-byPM2.5,sincedifferentedeffectsmayremovecertaincorrespondinglyinvariantcharacteristicsofPM2.5.Theeffectsizeincolumn4translatestoapproximately39,23022prematuredeathsperyearintheU.S.duetomonthlyexposuretoPM2.5,basedonthe2010U.S.populationandassumingthesampleaveragePM2.5of10:6µgm3isrepresentativeoftheentireU.S.IevidencethatmanyofthesedeathsaredrivenbyforwarddisplacementofmortalitywithinsixmonthsinSection4.2.5.OLSestimatesmaybedownward-biasedbecauseofsomecombinationofcorrelatedunobserv-ablesnotremovedbyedeffectsormeasurementerrors(potentiallyworsenedbyedeffects).Thetraditionalculpritsforbias,suchasresidentialsorting,seasonality,andcoincidentaltrendspre-sumablyhavetheirremovedbythestringentedeffectsimposedineachTheidentifyingvariationfortheOLSestimatesremainingisbasedonwithin-region,within-year,within-seasoncomparisons,withvariationlikelytobedrivenbythetotalityofincidentalvari-ationsinPM2.5emissionsandweatherpatterns.Co-emissionwouldbiasestimatesupward,aschangesinPM2.5emissionswouldlikelybeaccompaniedbychangesinotherpollutants.Ontheotherhand,theactivitiesunderlyingemissionsofPM2.5arelikelycorrelatedwithseveraltime-varyingeconomicandhealthbehaviorprocesses,includingchangesintrafsmokinganddruguse,short-termhealthinputs,physicalactivity,andstressfulevents.MorelikelyisthatmeasurementerrorplaysaroleinshrinkingbothOLSand2SLSestimatestowardzero,thoughthe2SLSestimatecorrectsthismeasurementerrortotheextentthatbothPM2.5andthePM2.5arecharacterizedbyclassicalmeasurementerror.County-levelaveragesofPM2.5andthePM2.5instrumentarecalculatedfromrawaveragesofmea-surementsatthesitesofpollutionmonitors,whicharenotalwaysspatiallyrepresentative.Inthetraditionalerrors-in-variablessetup,nonzerocorrelationbetweenthetruevalueoftheregressorandthemeasurementerror(i.e.,non-classicalerror)hasdifferentimplicationsforbias(expressioninAppendixA.4).Inthecaseofnegativecorrelationlargeenoughrelativetothesignalvalueofthemismeasuredregressor,thecoefestimatecanalsoreversesign.Stationstendtobelocatedinmoredenselypopulatedandplausiblymorepollutedareas.Moredenselypopulatedareashavehigherpollutionbutwouldhavetheiraggregateexposureswell-measuredbylocalstationobser-vations.Less-denselypopulatedandless-pollutedareaswilluseinformationonPM2.5frommore23highly-populatedareas,resultinginoverestimationofPM2.5levels.ThecombinationofthesetwofactorsmayresultinanegativecorrelationbetweenthemeasurementerrorandPM2.5levels.Table11reportsestimatesbyagegroup,revealingthattheobservedaggregateeffectsarepri-marilydrivenbythethreeagegroupsoverage65.Elderlyindividualsaremorelikelytobelivingatvulnerablehealthmargins,andthusaremoresusceptibletoarelativelyshort-termshocktopollutioncausealife-threateninghealthcomplication.Also(notreportedintables),theestimatedeffectistwiceaslargeforwomenasitisformen.Similarly,Chenetal.(2005)ahigherin-creasedrelativeriskforfemalesforfatalheartdiseaseandKunzlietal.(2005)foratherosclerosisfromPM2.5exposure.4.2.2HeterogeneousEffectsofPM2.5byChemicalCompositionTheinclusionofanyofthecontrolsforotherpollutantsresultsinasharpincreaseintheestimatedeffectofPM2.5onall-causemortalitybyabouttwoandahalftimes(PanelsBandC,Table8).IntandemwiththedistinctivechangesincompositionacrosstheobservedinSec-tion4.1.2,theincreaseinmortalityestimateswithadditionalcontrolssuggeststhatPM2.5hasheterogeneouseffectsthatdependonitsunderlyingchemicalcomposition.BecauseofchangesinthetoxicologicalpropertiesofthePM2.5whoseeffectsarebeingmeasured,theinterpretationofchangesineffectestimatesacrossdifferentstrategiesispotentiallyambiguous,evenwhentheregionsandemissionssourcesbeingstudiedareidenticalacrossestimationmeth-ods.Inahomogeneous-effectsworld,apollutantA'shealtheffectestimateisbiasedupwardbytheeffectofharmfulpollutantBco-emittedfromimplyingthatincludingcontrolsforpollutantBwouldmaketheestimatedeffectofpollutantAsmallerinexpectation.Thispropertydoesnotalwaysholdifthereareheterogeneouseffectsfromchemicalcomposition.,heterogeneouschemicalcompositionmayresultinsomecontrolsremovingthestatisticalenceofsomesubspeciesofpollutantAinfavorofmoreharmfulones.A1µgm3increaseinambientPM2.5inducedbygeneralPM2.5willhaveasmallermarginalhealthimpactthana1µgm3increaseofPM2.5subspecieswithabove-averagetoxicity.Control-24lingforanotherpollutanteliminatesvariationfromemissionsandatmospherictrajectorycomponentscommonbetweenthecontrolpollutantandPM2.5intheofthePM2.5coefresultingingreaterweightonidiosyncratictothefueltypeattheanddepositionbehavior.Forexample,OrganicCarbons(OC),commonbyproductsofprimarycom-bustion,areamajorconstituentofemissionsbymassacrossallfueltypesandwouldthushavealargepartoftheirremovedbyincludinganyotherpollutantcontrolsduetotheircommonalitytoall3BecauseincludingcontrolschangesthebreakdownofPM2.5thatisidentifyingtheeffecttofavorrelativelymoremassfromhighlytoxicspecies(asdemonstratedinSection4.1.2),wecannotunequivocallyexpectincludingcontrolstohaveanetdownwardeffectonthemagnitudeofhealtheffectsestimates.Hence,theincreaseinmortalityestimatesisprimafacieevidenceoflargecom-positionaleffectsinPM2.5.AsshowninSection4.1.2,theinPanelBandPanelCchangesinPM2.5composedofgreaterproportionsofmetalsandnitratesthantheno-controlinPanelA.AcommonintheepidemiologicalandmedicalliteratureisthatPM2.5effectsarehigherinthepresenceofmetallicPM2.5subspecies.Bell(2012)15%largerPM2.5effectestimatesforcardiovascularandrespiratorymorbiditywhenambientNickel(N)iselevated.Inastudyofrats,Pozzietal.(2003)evidencethatiresponsefromparticulatesisdrivenbycontaminantsadsorptedontoparticlesbycomparingresponsesbetweenexposurestourban-sampledparticulatematterandpureblackcarbon.Therearealsosomeshiftsinpredictionsofcriteriaandorganicgasesdependingonthesetofpollutantcontrols,suggestingapotentialroleofchangingcorrelationswithomittedpollutantsdrivingtheincreaseinmortalityeffects.However,thelossofPM2.5massfromcarbonsandgainfrommetalsisroughlystableacrossestimatesusingdifferentpollutantcontrolgroups;thechangesinestimatedmasscontributionsbytheinstrumenttothesegaseousspeciesvarieswidelywithcontrolgroups;andestimatesarerelativelystableacrosscontrolgroupsetsafterthe3Also,thedepositionparameterschosenforPM2.5placerelativelymoreweightonPM2.5specieswhosedeposi-tioncharacteristicsmimicthechosenparametersmostclosely.ThishasambiguousestimationconsequenceswithoutfurtherinvestigationofthedistributionofemissiondepositioncharacteristicsacrossPM2.5subspecies.25controlpollutantisincluded.WhilethisanalysisisnotasubstituteforjointIVestimationofallpollutants,thisisevidencethatmostoftheincreasesineffectsaredrivenbyasetofPM2.5speciesandnotfromconfoundingbysimultaneousemissionsofcriteriaandorganicgases.DespiteattemptstocontrolforO3productionbymodelingitskeyprecursorNO2,theinstrumentpredictsapproximately1ppbofO3per0.1µgm3ofPM2.5predictedbytheinstrumentacrossBelletal.(2004)a0.52%increaseindailymortalityper10ppbincreaseinthepreviousweek'sO3;ifthiseffectweretrueandthebasemortalityrateis67.6deathsper100,000,then0.35ofthe1.04deathsper10µgm3ofPM2.5estimatedwiththePM2.5instrumentandnocontrolsareattributabletobiasfromO3.TheestimateforO3-relatedbiasiscomparablefortheeffectwithallnon-PM2.5controls,butrelativelysmaller(0.35of2.68deaths).4.2.3NonlinearEffectsofPM2.5Usingacontrolfunctionapproachtoestimatenonlineardoseresponseofshort-termmortality,IthatthemarginaleffectofPM2.5slightlydeclinesatlowconcentrations(lessthan5µgm3)andbecomesapproximatelylinear.Previousstudiesusingmulti-citytimeseriesanalysesexam-iningshort-termPM2.5doseresponsehavealsofoundaroughlylinearrelationshipbelowtheNAAQSconcentrationlevelof25µgm3forall-causemortality(Schwartz,Laden,andZanobetti2002;Stiebetal.2008);Danielsetal.(2000)additionallyapproximatelinearityinPM10forall-causemortality.Piecewiseregressionsforanendogenousvariablecanbeeasilyestimatedviacontrolfunctionmethodswithouttheneedtodevelopadditionalidentifyinginstruments.Inthecontrolfunctionprocedure,theregressionisidenticaltotheconventionalIVstage,buttheresidualsfromthatregressionaregeneratedandusedasacontrolvariableinaregressionoftheoutcomeontheendogenousvariable.Resultscanbemadefurtherrobusttoendogeneitybycontrollingforcorrespondingnonlinearfunctionsofthecontrolfunctionresidual,accountingforchangingcorrelationwiththeerrortermacrossthesupportoftheendogenousvariable.Theonlyotherrequiredassumptionsaremeanindependenceoftheinstrumentfromthestructuralerrorandthatthedistributionofthestageiscorrectlyinthecaseofpollution,theen-26dogenousexplanatoryvariableiscontinuousanditsrelationshiptotheinstrumentisconceptuallylinear.Figure5isagraphofthevaluesand95%intervalofasplineregressiondividingaveragemonthlyPM2.5concentrationsintosplinesbydecile(denotedbyverticalbars),controllingforthelinearcontrolfunctionresidual.4.2.4Short-TermEffectsofPM2.5byCauseofDeathIestimateeffectsonmortalityratesbybroadcause-of-deathcategoriesusing(4)fromTable8,andreporttheresultsinTable9.Unsurprisingly,thefataleffectsofPM2.5manifestmoststronglythroughcardiovascularandrespiratorycauses,consistentwithpriorliterature.A10µgm3increaseinaveragemonthlyPM2.5isassociatedwithadditionaldeathsfromischemicheartdisease(0.26additionaldeathsper100,000),cerebrovasculardisorders(0.17),andpneumonia(0.15),andchroniclowerrespiratorydisease(0.19).PM2.5alsohasanimpact(0.16)ondeathsintheICD-10'sbroadfiOtherDiseasesflcategory,suggestingthatPM2.5exposureseitherleadtocomplicationsforalready-vulnerableindividualsoralsocausecardiovascularandrespiratory-relateddeathsforindividualswhosecauseofdeathiscodedinaccordancewiththepresenceofanothermajorhealthcondition.Thesewide-rangingeffectsaresupportedbythemedicalliterature,whichgenerallyvar-iousundesirableimmunesystemandotherbodilyresponsestoparticulates.Proposedpatho-physiologicalpathwaysforshort-termeffectstoexposureofPMreviewedinBrooketal.(2010)andPopeetal.(2003)includetheproductionofycytokinesthatcreateasystemicresponseaffectingbodilyareasoutsidethelungs(alsoinvanEedenetal.2001),systemicoxidativestress,changesincoagulation,changesinbloodpressure,impairedvascularfunction,andincreasedheartratevariability.Brooketal.(2010)citesomeevidenceontheeffectsofparticulatesonbiomarkersforthesepathways,likelyduetoheterogeneityinchemi-calcompositionandexposuredurationandintensity,butneverthelessrevealacommonassociationbetweenPM2.5andimportantbiomarkersrelatedelevatedrisksofcardiovascularandrespiratorymorbidity.studieshavealsotiedcertaintypesofmorbiditytoparticulatepol-27lution,suchaspneumonia(Zeikloffetal.2002;Zeikloffetal.2003)andchronicobstructivepulmonarydisease(MacNeeandDonaldson2003).Therearealsoresearchwhichasso-ciatesubspecieswithcertainrespiratoryandcardiovascularhealtheffects.Dyeetal.(2001)pulmonaryinjuryinratsafterexposuretoPM2.5subcomponents,withsuggestiveevidenceofthehighpulmonarytoxicityofmetalparticulates,whileHuangandGhio(2006)implicatearsenic,mercury,andnickelexposureascausesforanemia,tachycardia,andincreasedbloodpressure.Theinclusionofthenon-PM2.5pollutioncontrolsshowthecorrespondingincreasesoftoxicityofimpliedchangesinPM2.5acrossthesedominantcausesofdeath.EffectsperunitmassPM2.5onincreasebyfactorsofapproximately1.8forischemicheartdiseaseandcere-brovasculardeathsand2.3forchroniclowerrespiratorydeaths,whileincreasingbyafactorof3forandotherdisease-relateddeaths(thoughindividuallyremainwithinsam-plingerroroftheno-controleffectsizes).AssumingtheseaccuratelyrepresentthecomparativemagnitudesoftrueeffectsandthatchangesinPM2.5compositionexplainmostoftheestimatedincreaseinmortalityperunitmass,thisimpliesgreatertoxicityofPM2.5metalsforrespiratoryandgeneralillnessesrelativetocardiovascularillnesses.Oneexplanationisthatmetalsinterferewithantimicrobialprocessesinthelungs,therebyraisingtheriskandseverityofinfec-tion.Systemicresponsemayalsoinhibitthebody'sabilitytoinfectionsoutsidethelungs.Asasensitivitycheck,IestimatewhetherPM2.5hasanimpactonex-ternalcausesofdeath(Table10),withrationalecomparabletoHeutelandRuhm(2013):ifeffectestimatesaredrivenbyconfoundingvariationfromseasonalortrendingfactorsrelatedtobothandmortality,thenexternalcausesofdeathphysiologicallyunrelatedtomightshowaneffect.Iconsider5outcomegroupsasbytheICD-10:deathsfrommotorvehi-cleaccidents,accidents,suicides,assaults/homicides,andfromfiallotherexternalandcauses.flMotorvehicleaccidentsmayregardlessbeaffectedbyinextremecases,aswild-nearmajorroadwayscanrapidlyimpedevisibilitycausingmassive,multi-vehicleaccidents(Collinsetal.2009).ICD-10'sfiallotherexternalandcategorycontainsdeathsdue28toexposureandacutesmokeinhalation,whichwouldthedeathsofruralresidents,campers,hikers,andotherindividualswhomaybetrappedinthevicinityofaHowever,neitheroftheseshowanyrelationshiptosmoke,whichissomeevidencethatpollutionexposureisdrivenbydistantenoughtonothavepotentialdirecteffectsofeventsthemselves(e.g.,stresscausedbyimminentdangerorpropertydamage).Ialsonorelationshiptosuicidesandhomicides.Withnopollutioncontrols,Iamoderate,marginallystatisticallypositiveeffectonthedeathsunderthefiotheracci-dentsoradverseeffectsflcategory,whichincludesalldeathsduetocomplicationsrelatedtosurgeryormedication.Thisresultmaybeexplainedbyexpectedincreaseinthefrequencyofmedicalcarebeingadministeredforincreasedratesofmorbidityduetopollution.4.2.5LaggedandLeadShort-TermAssociationswithPM2.5Estimatingcausalassociationsofairpollutionwithhealthoutcomesiscomplicatedbyawiderangeofpotentialintertemporalrelationshipsbetweenoutcomeandregressor,bothcausalandnon-causal.Therearethreereasonstoexpectlaggedpollutionvaluestohavenegativeeffects:forwarddisplacementofdeaths,depletionoffuelstockscombinedwithcontemporaneousmeasurementerror,anddenominatorerrorinpopulationratesduetoannualpopulationmeasures.InTable12,Ireportreducedformestimatesoflead,lagged,andbothleadandlaggedeffectsoftheinstrumentonall-causemortality,aswellasthejointF-statisticoflead/laggedcoefIevidenceofforwarddisplacementandgenerallyviolationsofthestrictexogeneityassumptionforedeffectsestimators.Pollutionexposurecausesforwarddisplacementofaneventifitcausestherelocationofaneventthatotherwisewouldhaveoccurredtoanearliertimeperiod.SchlenkerandWalker(2011)arguethatwelfareimpactsofairpollutionthroughmorbiditywouldbebeoverestimatedifforwarddisplacementoccursandisnottakenintoaccount(buttheytestforandnoevidenceofforwarddisplacementofhospitalizations).Unlessthereisavalueonpostponingaparticularoutcome,theonlynegativeimpactpollutionexposurewouldhaveonwelfareisthrougheventsthatcounterfac-29tuallywouldnothaveexistedifnotfortheexposure.Sinceeverybodydies4,welfareeffectsofpollution-inducedmortalitycanonlybemeasuredthroughtheaveragechangeinlifeexpectancy.Short-termpollutionexposuresmayprimarilyonlyaffectthosewhowouldotherwisediewithinafewmonths,butforwarddisplacementofmortalityinthissenseisstilleconomicallymeaningfulaslongasindividualsplacepositivevalueonanadditionalmonthoflife,thoughonemightexpectthatsuchvalueislowerthanthatofahealthyworkingindividual.Theestimatesinthesecondcolumnrevealforwarddisplacement.Ifwildsmokeismeasuredwithsubstantialerror,partoftheerrortermofobservedpollutionisafunctionofthetruelevelofpollution,whichmayinturnbepredictedbypast(orfuture)pollutionduetofuelstockdynamics.Alargemayburnfuelsaccumulatedoverlongperiodsthatarenotimmediatelyreplaced.Winthenearfutureinthesameareaarewell-situatedtoaffectthesamedownwindareasasthepastlargebutlikelytohavesmallersizesandshorterdurations.Inturn,highconcentrationsinthepastpredictlowconcentrationsinthepresent,whichwouldresultinlowerpresentmortality.Lastly,errorinthepopulationmeasureusedtocalculatemortalityratesmaycausealaggednegativerelationshipbetweenmortalityandpollutiontoappear.Imeasuremortalityratesusingannualintercensalestimatesofpopulation,butmeasuremortalityeffectswithmonthlyfrequency.Holdingchangesduetobirthsandmigrationsed,ifcontemporaneouspollutioncausesdeathsinonemonth,thenthefollowingmonth'spopulationcountistoohigh,resultinginameasuredmortalityratelowerthanthetruerate.Themeasuredrateishencenegativelycorrelatedwiththepreviousmonth'spollution,generatingdownwardbiasinestimatesoflaggedeffects.Jointlyleadandlaggedeffectsareinterpretedasevidenceofviolationofthestrictexogeneityassumptionneededforlarge-N(numberofcross-sectionalobservations)consistencyofedeffectsestimatorswithasmallnumberoftimeperiods.Theinconsistencyhasboundsshrinkingatarateproportionaltothenumberoftimeperiods(Wooldridge2010),whichinthiscaseis84months.InallthreeIevidencethatstrictexogeneityisviolated.4Iwasunabletoacitationforthis.30Laggedandleadeffectsmayalsobeindicativeofshockscorrelatedwiththeregressorthataffectmultipletimeperiodsandtheoutcomevariable.Inthesetting,thismaybeweatherorclimatologicalvariablesnotadequatelycapturedbytemperature,precipitation,andannualandseasonalregionaledeffects.4.3EffectsonInfantHealthWhileinfantsatthemostvulnerablehealthmarginsmaybemorelikelytodiefrompollutionshocks,thelargerpopulationofsurvivinginfantsmayhavetheirhealthafterbirthandsubsequentqualityoflifeimpactedbyinuteropollutionexposure.Table13reportsIVestimatesforaverageexposuresoverthe9monthsprecedingbirthand4monthsprecedingbirth.PrenatalexposuretoPM2.5hasastrongeffectonprematurebirths,witheffectsconcentratedinthe4monthsleadinguptobirth.A10µgm3increaseinPM2.5overthegestationalperiodisassociatedwitha2.6percentagepointincreaseinthenumberofprematurebirthsandanaveragedecreaseingestationalageof0.23weeks.Therearealsonegative,butnotstatisticallyeffectsonaveragebirthweight,amountingtoa19gdecreaseper10µgm3increaseinPM2.5.Aswiththemortalityoutcomes,controllingforNO2andSO2strengthenseffects,testamenttotheincreasedrelativetoxicityofaunitchangeinPM2.5;inthe4monthsbeforebirth,a10µgm3increaseinPM2.5lowersaveragebirthweightsby31g,butthereisnoin-creaseinthelikelihoodoflowbirthweight.Iftheincreasedtoxicityalsowouldresultinincreasedfetalattrition(weaklysuggestedbytheincreaseintheeffectonpercentageoffemalebirths),thenthiseffectislikelytobeoccurringforhealthierneonates.Alternatively,theeffectcouldbedrivenbyadditionalgrowthlossesforneonateswhoregardlessofexposurewouldhavebeenlowbirthweight.Thereareatleastfourclassesofphysiologicalmechanismswhichmayexplaintheobservednegativeassociationswithbirthweight:intrauterinegrowthrestriction,fetalgeneticorepigeneticchanges,pollutant-DNAadducts,andprematurebirth(Slamaetal.2008).Prematuritymaybehighlycorrelatedwithanyoftheothermechanisms,ortheincreasedratesofprematurityalonecouldbedrivingmostoftheeffect.31Thecomplexinteractionofbirthtiming,overlappingexposuresbetweenbirthcohorts,andstrictexogeneityrequirementsforedeffectsestimatorsarepossiblehazardstoidentifyingmean-ingfuleffectsofinuteroexposure.Becausetheseexposureestimatesareframedrelativetothebirthmonth,andnotthemonthofconception,substantialharmfuleffectsmaybeattributabletodisplacementofunhealthybirthsfromfuturecohortsintocurrentonesviadecreasesingestationalage.Inthesamevein,Iexpectthatdisplacementduetoprematurebirths(andfetaldeaths)causedbyPM2.5exposurewillcausebiasintheoppositedirectionduetocohortcompositioneffects,asinfantswithworsehealthoutcomesaredeselectedfromabirthcohortanddisplacedintoearliercohorts(orcompletelyremovedthesampleduetofetaldeath).Exposuretimingvariesevenforbirthswithinthesamemonth(byasmuchas30days),resultinginamixtureoftrueexposureeffectsestimatedineachexposurewindow.Morecomplicatedly,iftheerrorisnotstrictlyexoge-nousconditionalontheexposuremeasuresandcontrols,thenexposurewindowswithamixtureoftrueexposureperiodandnon-exposureperiodswillamixtureofexposureeffectsandstrictexogeneityviolations(i.e.,feedbackbetweenthedependentvariableandlead/laggedvaluesoftheregressor).Attritionfromfetaldeathsislikelytocausedownwardbiasinthemagnitudeoftheseestimates.Onekeypieceofevidenceforfetalattritionisthelarge,albeitimpreciselyestimated,effectofaver-ageexposureonthesexratio:each1µgm3increaseinaveragePM2.5exposureoverninemonthsbeforebirthraisesthepercentageoffemalebirthsby0.2percentagepointswithnonon-PM2.5controlsand0.4percentagepointswithNO2andSO2controls.ThismagnitudeiscomparabletotheofSandersandStoecker(2011)forTotalSuspendedParticulates(TSPs),whichareallparticleslessthan100µm.LimitedmonitorcoverageatthetimeofCleanAirActmakesitimpossi-bletoascertaintheeffectsTSPreductionshadonparticulates.Usingroughconversionfactors(basedonratiosofmeansintheAQSdata)forTSPstoPM10of0.55,andPM10toPM2.5of0.6,aone-unitchangeinTSPscorrespondstoa0.33unitchangeinPM2.5,translatingtheestimateto0.067percentagepointsperunitchangeinTSPcomparedtoSandersandStoecker's(2011)0.088.ThispatternofresultsiscomparabletoBharadwajandEberhard's(2008)estimatesofthe32effectsofPM10inSantiago,Chileonbirthoutcomes,butwithsmallermagnitudes.Theyestimatea125geffectonbirthweightper17:57µgm3(onestandarddeviation)increaseinPM10pollution1-16weeksbeforebirth,whereasIestimateasubstantiallysmallereffectof32gforacomparablechangeinPM2.5(againusingaconversionfactorofPM10=0:6PM2:5).BesidesdifferencesintoxicitybetweenPM10andPM2.5(whichweregardlessmightexpecttomakethedifferencesmaller),thislargedifferenceislikelytobedrivenbysomecombinationofnonlineareffectsduetothesubstantiallyhigherpollutionlevelsintheirsampleperiodandtheeffectsofomittedpollutantsthatalsodecreasewithrainfall.AveragePM10intheU.S.sampleperiodis18µgm3comparedto76µgm3intheSantiagosample,andanyincreasingdoseresponsewouldbeTherainfallinstrumentislikelytobestronglyassociatedwithdecreasesinnon-PM10pollutantsrelativetoitsassociationwithPM10.Whiletheinstrumentdoespredictsomenon-PM2.5pollutionlevels,thiscontribution(andthuspotentialupwardbiasinestimates'magnitudes)isconstrainedbytheinstrument'sdependenceonPM2.5emissionsanddepositionparameters,comparedtothebroadandrelativelylessPM-heavydistributionofpollutantsfromallindustrialsourcesinornearSantiago.Lastly,becausetheyidentifytheirpollutionchangesthroughrainfall,theyalsoidentifyeffectsonhealthoutcomesthroughtheassociatedchangesinwaterpollutiongeneratedthrudeposition;depositedpollutantsrunoffintowaterandfoodsuppliesandareexposedtoindividualsthroughconsumptionandskincontact.Thiscanbiastheirestimateseitherway,dependingonwhetherthepollutantsaremoreharmfulafterdepositionorintheair.Incontrast,Icontrolforlocalrainfall,whichwillgenerallyaccountfortheaggregateeffectofdepositedpollutantsthatcouldaffecthealthoutcomesthroughthewatersupply.Ifdepositionoccursinwatershedsoutsideoftheareathataffectthearea'swatersupplyandprecipitationdiffersbetweenthetwoareas,thentheairbornepollutantestimatedeffectsmaystillpickupeffectsfromassociatedchangesinwatersupplyquality.335WeExternalitiesandCurrentManagementPolicyWinducechangesinPM2.5concentrationsoverlongdistances,withpollutedairparcelscrossingintranationalandinternationalboundaries.Assumingthatmonitoringstationsarerepresentativeofastate'soverallexposuretopollution,IcalculatethefractionofPM2.5-monthsthatoccuroutsidethestateofthethatover75%ofgeographicexposuretoPM2.5fromlargeeventsinthecontinentalU.S.occursinstatesotherthanthestateoforigin.Table7reportsthepercentageofmodeledPM2.5exposurethatoccursoutsideofeachstateoftheoccurrenceasanapproximationoftheintensityofinter-statepollutionexternalitiesfromBecauseoftheimpliedexternalities,managementissubjecttotheclassictradeoffbetweenineflocalmanagementbehaviorandpotentiallyinef-centralized,uniformpoliciesforenvironmentalgoods.Totheextentthatlocaljurisdictionsinchargeof(e.g.,stateagencies)areindividualactorsandignoreinter-statepollu-tionspilloversinmakingmanagementdecisions,thentheywilltendtounder-suppressactivityorengageinmoreaggressiveprescribedburningforotherlocalThestructureofmanagementintheU.S.isacomplicatedmixtureofmanyagenciesactingindividu-allyandcollaboratingatmultiplelevelsofgovernment,whiletheCleanAirActdoesnotpenalizestatesforpollutionfromnaturally-occurringHence,itisunclearwhethercurrentmanagementeffortsproperlyaccountforthewelfareeffectsfrompoorairquality.Whileairqualityexternalitieslargelymakeabatementanationalenvironmentalgood,itisuncertainwhetherpolicywouldstronglyimprovewithgreatercentralization.BanzhafandChupp(2011)showfortheU.S.electricitysectorthatauniformfederalpollutionabatementpolicyhasbetterwelfareimplicationsthandecentralizedstatepoliciesbecausetheinter-statespilloversaddressedbyauniformpolicyarerelativelymoreimportantthanthebetween-stateheterogeneityofbeneaddressedbydecentralizedpolicies.Theyarguethatrelativelyinelasticmarginalcostofabatementintherelevantregionoftheuniformpolicyresultsinsmallerdistortionsfromignoringbetween-stateheterogeneityofmarginalWarecharacterizedbylargeinter-state34spillovers,buttheconcavityorconvexitypropertiesofthemarginalcostsofabatementareunclear,asaretheirtruemarginaldamages.Wmanagementhastwodimensionsofabatement:pre-measures,suchasprescribedburningandfuelclearing,andsuppressionefforts.Marginalcostsofsuppressioneffortsarerelativelyeasytomeasure;forexample,Donovan(2006)aconvexmarginalcostfunctionforthenumberofcontract-basedcrewshiredinaseason.Regardless,allabatementmeasuresmayhavestrongheterogeneityanduncertaintyinmarginalandcostsassociatedwiththem.Prescribedthemselvesgeneratepollutionandsomeecologicalhazardsbecauseoftheirtiming(Knappetal.2009).Naturally-occurringhaveecologicalsuchasbiodiversityandbetterdiseaseregulation,whichmaypotentiallycounterbalancethemarginalofimprovedairquality(e.g.,KeaneandKarau2010).Evensuppression'scannotbewell-accountedfor,asaggressivesuppressioncanleadtohigherlikelihoodandintensityoffuturebyalteringthenatureoffuelaccumulation(Yoder2004).Despitefederalguidelinesgoverningsuppressionattemptsintheinterestofprotectingpub-lichealth(FireExecutiveCouncil2009),theincentivesfacingtheagenciesmakingmanage-mentchoicesarevaguerelativetotheregulationofagentsgeneratingindustrialairpollution.TheCleanAirActdistinguishesbetweenfiunplannedflandfiplannedflonlypenalizingstatesforthepollutiongeneratedbyplanned(i.e.,prescribedburns),resultingintheadversehealtheffectsofnaturalnotbeinginherentlytakenintoaccountbyairqualityregulations(EngelandReeves2011).Therearefederaldirectivesandfundingformanagement,with$3.9billionallocatedforFY2014(Bracmort2013).Decision-makingregardingsuppressionandprescribedburningisnotfederally-determined,however.Currently,managementintheU.S.predomi-nantlyfallsuponvefederalagenciesforover1,000acres5andindividualstate,county,andlocalagencies,withfrequentinteragencycollaboration.Fortheinthesampleperiod,36%werereportedbystate,county,andlocalagencies,whiletheremainderwerefederally-reported.TheForestServiceandandBureauofLandManagementreportedthemajorityoftheremaining5ThesearethetheBureauofLandManagement(BLM),BureauofIndianAffairs(BIA),theU.S.ForestService(USFS),FishandWService(FWS),andNationalParkService(NPS)for99%offederally-reported35Thereareambiguitiesregardingwhichagencyisresponsibleforsuppressiondecisions;forexample,theagencymakingthereportdoesnotalwayscommitalloftheresourcestaskedwithmanagingtheandmultipleagenciesmayreportthesamebutonlyonerecordisretainedintheFPAdatabase.6ConclusionThisstudyusesnewtoolstomeasurethehealthexternalitycostsofbothindustrialandnaturalsourcesofairpollutionandprovidesestimatesfortheeffectsofparticulatematteronmortalityandinfanthealth.Tomyknowledge,itisthetosynthesizehistoricalemissions,atmospherictransportmodels,andground-levelmonitoringdataatalargescaletoestimatethedistributionofenvironmentalpollutantsandtheirhealtheffectsintheUnitedStates.Itsdesignprovidesspatiallyandtemporallysmoothmeasuresofpollutionshocks,andtheabilitytoconstructafullemissions-to-destinationmodelingprocessprovidesalargedegreeofcustomizabilityandcontroloverthevariationusedtoidentifychangesinairquality.Thechoiceofasemissionssourcere-sultsingeographicallywide-reachingvariationinparticulatelevels,inducingbothsmallandlargeshockstohighlypollutedandrelativelyunpollutedareas.Theofeffectsonshort-termmortalityandinfanthealthcontributetothebodyofevidencesupportingthatPM2.5,andgener-allyairquality,hasimportantimpactsonhumanhealth.Theyalsohighlighttheimportanceofmanagementasanimportantpublichealthissue.Asmightbeexpectedwithanewsourceofdata,thereareseveralstatisticalissueswhichmustbeaddressedtofullyrealizethepotentialoftoidentifyuseful,policy-relevanthealtheffectsparameters.Incorrectexposuremeasurementsinbothspaceandtimecreatepotentiallyseriousmeasurementerrorproblemswhichareonlypartiallyalleviatedbyinstrumentalvariablestechniques.Imperfectmonitoringcoverageresultsinmeasurementerrorofexposuresbothwithinandbetweengeographicunits.Spatialmeasurementerrorcanbealleviatedthroughmorecom-prehensivemeasuresofambientpollution,generatedthroughacombinationofinterpolationofdatapoints,remotesensingdata,andtwo-sampleinstrumentalvariablesestimationtechniques36(Khawand2014).Two-sampleIVtechniquescanalsobeusedtoincludegeographicregionswithnomonitoringcoverageinestimatinghealtheffects,resultinginestimatedaverageeffectsmorerepresentativeoftheU.S.population.Intheshortrun,thisstudycanbeimproveduponthroughdevelopingrichermodelinputsfromhigher-qualitydataproductsthatrequiresubstantiallygreatercomputationalinputtoimplement.Satelliteproductsfordetectionallowburndynam-icstobebetterparsedoutinspaceandtime.Higher-resolutionmeteorologicalproductscanbeusedtobettercaptureshort-rangedispersionpatterns,whichinturnrequiremoreintensivegeo-graphicsamplingschemestoproperlytranslatetoaggregateconcentrations.Themodelingofairqualityimpactitselfalsostandstobeimproved.Therelationshipbetweenthepollutionforecastsandactualpollutionlevels,whileintu-itivelyseemingtoberelativelyuncomplicated,issubjecttomeasurementerrorsduetothecomplexinteractionamongfuel,andmeteorologicaldatainputsandmodelingas-sumptions.Modelingerrorsmayoccurduetounmodeledheterogeneityatthesourceorbetweenthesourceandthedestination.Aricherexplorationofheterogeneoussource-receptorrelationshipsisneededtounderstandwheremodelingerrorsmayresultinputtingundueweightonhealthef-fectsincertainareasordiscardingusefulvariationinothers.Extensivefurtherwork,particularlyincollaborationwithscientistsinthecommunity,isrequiredtoimprovetherealismandpredictivepowerofthepollutionsimulation.37APPENDIX3839Figure1:NumberofAcresBurned(Thousands)forAllFiresGreaterthan1,000Acres,2000-2010Mapshowsthenumberofacres(inthousands)forall1,000acreorgreaterintheUSfrom2000to2010bystate,rangingfromred(mostareaburned)toblue(leastareaburned).Figure2:WAirPollutionModeling-BlueSkyFrameworkWwFlowchartdepictingthemodelingwwtoproducepollutionconcentrationoutputsfromin-gestionofdatatooutputbytheHYSPLITmodel.]4041Figure3:AverageRawWPM2.5OutputbyCounty,CONUS,2004-2010MapofuntransformedaveragePM2.5concentrationsbyU.S.countyfor2004-2010sampleperiod.Darkbluevaluesrepresentlowconcentrationsandbrownvalueshighconcentrations(e.g,Californiahashighconcentrations,whileMainehaslowconcentrations).Figure4:Quantile-QuantilePlotsofPM2.5versusCounterfactuals(a)PM2.5(withWPM2.5)versusEstimatedCounterfactualPM2.5(NoWPM2.5)(b)PM2.5versusCounterfactualPM2.5EstimatedwithFixedEffectsAplotsPM2.5againstthecounterfactualestimatedusingthePM2.5instrument;sub-Bplotsitagainstthecounterfactualasestimatedusingstate-year,state-month,andcountyedeffects.Eachpointontheplotrepresentsthevaluesineachdistributionatwhichthequantilesareequivalent.42Figure5:SplineControlFunctionRegressionofAll-CauseMortalityonPM2.5,byDecileThisisaplotoftheestimatedeffectofaveragemonthlyPM2.5estimatedbysplinesindecilesofaveragemonthlyPM2.5conditionalonalinearcontrolfunctionresidualusingPM2.5astheexcludedinstrument.43Table1:AQSPM2.5andNLDASWeatherDescriptiveStatisticsCounty-leveldescriptivestatisticsfor2004-2010acrossallU.S.counties.PM2.5concentrationisonlyavailableforcounty-monthswithmonitoringdata.Table2:Monthly,County-LevelMortalityRate(per100,000)bySubgroupfromU.S.DeathCer-2004-2010County-levelmortalityratescalculatedusingU.S.deathdatefor2004-2010.Allarescaledper100,000countypopulation.44Table3:Monthly,County-LevelMeanBirthOutcomesandRatesforBirthCohortsfromU.S.Birth2004-2010County-levelbirthoutcomedescriptivestatisticsderivedfromU.S.birthdatafor2004-2010.4546Table4:FirstStageRegressionofPM2.5andRegressionsofCriteriaPollutantsonWInstrumentTable5:RegressionsofHighlyToxicPM2.5SubspeciesonWPM2.54748Table6:RegressionsofNon-MetallicPM2.5SubspeciesonWPM2.549Table7:PercentageofWPM2.5ExposureOutsideoftheStateofOriginTable8:IVEstimates:PM2.5EffectsonAll-CauseMortality(byFixed-Effectson)5051Table9:IVEstimates:PM2.5EffectsonMortality(byCause)Table10:IVEstimates:PM2.5(Non-)EffectsonMortalityfromExternalCauses52Table11:IVEstimates:PM2.5EffectonAll-CauseMortalitybyAgeGroup53Table12:ReducedFormLeadandLaggedWPM2.5EffectonAll-CauseMortality5455Table13:IVEstimates:EffectofPM2.5ExposureforFullGestationand16WeeksBeforeBirthonBirthOutcomesFigure6:PiecewiseRegressionCoefEstimatesofDailyStationPM2.5onRawandLog-transformedWPM2.5ModelOutput,byVigintile(a)RawWPM2.5Output(b)Log-TransformedWPM2.5Output5657Table14:Henry'sLawConstantsandDryDepositionVelocitiesforGaseousPollutantsTable15:RegressionofOrganicGasesonWPM2.55859Table16:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetI60Table17:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetII61Table18:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetIII62Table19:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetIV63Table20:RegressionofPM2.5MetalSubspeciesonWPM2.5,SetVIA.1EstimatingFireBurnDurationsWcanlastforaperiodofhourstohundredsofdays(forlarge,remotecomplexThebestmeasureintheFPAdatabaseofastarttimeisthediscoverytimebythereportingagency,whichisalmostalwaysreported.Thetimeofthecontainment,whichindicatesajudgmentbythemanagingagencythattheperimeterissecuredfromspreadingfurther,isreportedwithsimilarfrequency.OnlysomeoftheFPAdatabasesourcesalsohavereportsoftheirextinguishmentdates.Substantialemissionsmaystilloccurduringtheperiodbetweencontainmentandextinguishment,especiallyforlargeForgreaterthan300acres,approximately43%ofburntimeispost-containment.TobettercalibratethetimeofemissionsfromIusetheseeeventstoamodelandpredicttheburndurationforallintheabsenceofanexplicitly-reportedextinguishmentorfiput-outfltime.ImergeextinguishmentdatesfromtheDOI-USGSdatabaseofreportsfromsixmajorfederalagencies.Then,estimatealinearmodelofaburnduration:Di=cix+s(i)qs+m(i)qm+y(i)qy+riAburndurationisafunctionofitstimetocontainmentDi;itsland-areasize,mea-suredbyacategoricalfisizeclassflci;someunobservedseasonal-,year-,andfactors;andidiosyncraticfactorsri.Iestimatethisrelationshipusingallfrom2000-2010largerthan300acres.Thecontainmenttimeisnaturallyastrongpredictor,asitistheearliestacanbeex-tinguished.Itscoefissharplyestimatedcloseto1,suggestingthattime-to-containmentisatleastconditionallyunrelatedtounobservablecharacteristicsofthethataffectitstotalduration.Wherebothcontainmentandput-outdatesareunavailable,aisassigneditsdurationbasedonthesamemodel,estimatedwithoutincludingcontainmentdateasacovariate.Allpredictionslessthan1dayduetothelinearofthemodelareassignedavalueof1dayofburning.Allwithreportedandpredicteddurationsexceeding160daysareassigned160daysofburningtolowercomputationaloverhead.Thisisbasedonanassumptionthatreportedtoburninexcess64of160dayshavereportingerrorintheirrecordsorarelong-burningsmolderingwhichdonothavecomparableemissionstoamingThistruncationprocedureremovesapproximately10percentofemissiondays,andlessthan4percentofemissionswhenweightedbythetotallandareaoftheThepurposeofthesedurationestimatesistoimprovethepredictivepowerofmodeledconcen-trations.Errorsinthepredictionfromreportingerrorsorofthemodelforburndurationwillresultinemissionsofincorrectlength.Theseerrorswillnotaffectthevalidityofthemodeledpollutantconcentrationsasinstrumentsforobservedpollutantconcentrations,pro-videdtheyarestatisticallyunrelatedtothedeterminantsoftheobservedpollutantconcentrationsIdonotincludeinmystageestimation.A.2WeModelingDetailsA.2.1ModelingWwTheeventsfromtheFPAFODdatabaseareeachinputasindividualeventsintotheBSF.TheCONSUMEmodulereadsthecoordinatedataoftheeventanddeterminesthelikelyfueltypeusingtheFCCSfuelmap.CONSUMEthendividesthefuelconsumptionintosmouldering,andresidualemissionphase,eachofwhichhasadistinctcontributiontoemissionsvolumeforthesamefuel(asamodeloffuelcombustionefy).Combiningthefuelconsumptionwithempirically-derivedemissionsfactors,FEPSthenestimatesthequantitiesofheatandthepollutantofinterestreleasedbytheUsinganempirically-deriveddiurnal(i.e.,dailyrecurring)timeembeddedinFEPS,Igeneratea24-houremissionspatternthatrepeatsforeachdayaburnsandterminatesattheestimateddateofthesextinguishment.ThepatterndistributesthetotalemissionscalculatedbyCONSUMEamonghoursoftheday.Thismodelingstepisdesignedtoimprovedownwindconcentrationestimatesbyaccountingforburncyclesthatvarywithmeteorologicalparametersthatsystematicallyvarywithtimeofday,withloweremissionsduringnighttimehours.65TheFEPSPlumeRisemoduleestimatesthebuoyancyoftheemittedpollutantduetotheheatcalculatedbyCONSUMEandassigns20heightsintowhichfractionsofthehourlyemissionsareinjected.Thisstepthatquantitiesofapollutantwillbeloftedhigherfromalocationthemoreheatthereleases,andthatlargerwillalsotendtohavehigherplumesthatwillresultinlonger-rangetransport.Theresultisasetofhourlypoint-sourceemissionsforeachevent,with20emissionsquantitiesineachhourreleasedattheFEPS-calculatedaltitudes.Thepoint-sourceemissionsgeneratedbytheCONSUMEandFEPSmodelsfromtheeventdataaretheninputtedintoHYSPLIT,whichcalculatesthetrajectoryanddispersionoftheemittedpollutantsandoutputsaspatialofconcentrationsovertime.Tocalculateconcentrations,HYSPLITrequirescontinuousmeteorologicaldataspanningthetimeperiodoftheeventanditscorrespondingdownwindimpactsofinterest.Meteorologicalreanalysisdatasetsorarchivedforecastsaretypicallyusedforretrospectiveapplications.Here,IusetheEtaDataAssimilationSystem40km(EDAS40),anarchived3-hourlyforecastspanning2004tothepresentwithaspatialresolutionof40km.ThisforecastsystemwasdevelopedandmaintainedbyNationalWeatherService'sNationalCentersforEnvironmentalPrediction.HYSPLITrepresentsthedistributionofpollutantsfromasourcethroughthebehaviorofalargenumberofindividualfiparticlesfl(whicharecomputationalrepresentationsofpollutantmasses,nottobeconfusedwithparticulatepollutantsinthemselves).TheseparticlesarereleasedoverthedurationofanemissionandHYSPLITmodelstheiradvectivemotionusingthree-dimensionalvelocityvectorsfromthemeteorologicaldata.Inaddition,theparticleapproachaddsarandomcomponenttotheiradvectivemotionthatapproximatesarandomwalkprocesscalibratedbylocalatmosphericturbulence.HYSPLITparticlesareassignedaproportionalfractionofpollutantmassatthetimeofemissionandshedmassthroughatmosphericremovalprocesses(dryandwetdepo-sition).Concentrationsforagridcellarecalculatedthroughthesumofmassesofparticleswithinthegridcelldividedbythesizeofthegridcell.AllHYSPLITcalculationmethodsaredescribedindetailinDraxlerandHess(1997).Idescribedepositionprocessesandmychoiceofcalibrationparametersinthenextsection.66EachHYSPLITrunusesa5-daysetofhourlyburningemissionsat21verticallevelsforasin-glelocation.IsetHYSPLITtorelease300,000particlesperemissionshour,whichareevenlydividedamongtheverticalemissionlevels.IallowHYSPLITtocalculatethetravelofparticlesfor920hours(approximatelyveandahalfweeks)fromthehouroftheemission.Fromthesecalculations,HYSPLITcreatesanhourlyconcentrationgridfortheCONUSmodeldomainroughlymatchingtheresolutionofthemeteorologicaldata,witheachgridsquareencompassingapproximately1,600kmsq.for2004-2010.Isampleconcentrationsfromeachevent'sgridat10metersabovegroundlevelatpollutionmonitoringsitesandcensustractcentroids,sumcon-centrationsacrossallevents,andaveragetheresultinghourlyconcentrationstodailyaverageconcentrationsbyeachsamplingsite.Whiletherawoutputisconstructedfromemissionsmeasuresandconversionsthatwouldde-nominateitinµgm3ifitweretobetakenliterally,IremainagnosticabouttheunitsoftheoutputandallowregressionstoimplicitlyrescalethePM2.5measure.InAppendixSectionA.3,Iestablishthattheoutputhasastronglylogarithmictoobservedpollutiondataandtakealogarithmictransformationoftherawconcentrationsshiftedbyasmallconstant.ThiswillbethePM2.5instrumentusedfortheremainderofthepaper.A.2.2DepositionProcessesHYSPLIT'smodelingofdeposition,ortheremovalofpollutantsfromtheatmospherebyprecip-itationandsettlingorimpactionuponterrain,playsanimportantroleingeneratingindependentvariationamongpollutantstoallowtheseparateoftheirhealtheffects.HYSPLITdynamicallyaccountsfortheamountofairpollutionlosttoprecipitationbymodelingtheinter-actionoftravelingparcelsofairpollutionfromorigintodestinationwithtemporallyandspatiallysmoothrepresentationsofprecipitationevents.HYSPLITmodelsparticlepollutantwetdeposition(alsoreferredtoasfiwetremovalflandfiwetscavengingfl)viatwoprocessesdescribedasin-cloudremoval(fiwashoutfl)andbelow-cloudremoval(firainoutfl).6Forgaseouspollutants,itusesacal-6Thereissomeinconsistentusageofthetermsfiwashoutflandfirainoutflinacrosssomepapers,theirmeaningsoccasionallyswapped.67culationmethodbasedongassolubility.HYSPLIThasonecommonprocessforbothparticlesandgasesformodelingdrydepositionwhichassumesarateofremovaldrivenbywindspeed.Oneconstantcalibratestheintensityofeachprocess:thewashoutratio,represent-inganaverageratioofpollutantconcentrationinairtoconcentrationinwaterattheground;therainoutrate,oraedrateofpollutantremovalwhilepollutantconcentrationsareinameteoro-logicallayerwithprecipitation(s1);theHenry'sLawConstantforwetremovalofsolublegases(molatm1);andthedrydepositionvelocity(ms1).TheconstantsIchooseforeachpollutanttype,alongwithcorrespondingcitations,arereportedinAppendixTable14.Forreference,IalsoreportconstantsforrelatedpollutantsthatIdonotmodel.WetdepositionofparticulatepollutantsischaracterizedbyHYSPLITthroughoneprocessinwhichpollutedairisingestedovertimeintoproximalatmosphericmoisture(washout),andanotherinwhichrainfallsthroughpollutedair(rainout).Wetdepositionprocessesplayarelativelylargerroleinmassremovalofparticulatepollutantsthantheydoforgaseouspollutants,uptoanorderofmagnitudehigher,thoughthisrelationvariesbyspecies.WhilethereissubstantialheterogeneityintheefywithwhichPM2.5pollutantsareremovedbyrainbecauseofthemanycomponentsubspeciesandvariationintheparticlesizedistribution,awashoutratioof1105isbroadlyusedasanestimateforthewashoutratioofgeneralPM2.5.Intheabsenceofwell-establishedparametersforrainoutrates,IuseHYSPLIT'ssuggestedparticlerainoutrateof5105s1whichhasbeenusedinotherHYSPLITparticulatemodelingapplications(Chandetal.2008;Wenetal.2013).Iexpectthatempirically-derivedwashoutratioswillcapturemostdepositionsincetheyareoftenmeasuredwithoutHYSPLIT'sdepositionprocessdistinctioninmind,andatleastonestudythatbelow-clouddepositionisforparticlesexceptinextremeprecipitationevents(Andronache2003).Insteadofexplicitwashoutandrainoutparameters,gaseouspollutants'wetdepositioniscali-bratedbytheappropriateHenry'sLawconstantforthewater-solublegas.Henry'sLawholdsthatataconstanttemperature,thesolubilityofgasinaliquidisproportionaltothepressureofthegassurroundingtheliquid.AnintuitiveexampleofHenry'sLawatworkisacarbonatedsoda:while68sealed,asodabottlecontainsliquidwithdissolvedCO2andaspaceabovetheliquidwithCO2gas.Theopeningofthebottlelowerstheresultingpressureabovetheliquid,andovertimetheCO2escapesfromtheliquidandintotheopenairthroughthebottleopening.ThereverseprocessoccursifthereisliquidinthesamebottlewithnoCO2,andCO2isinjectedintotheemptyspaceofthesealedbottle:thehigherthepressureoftheresultingairspaceinthebottle(andthegreatertheconcentrationofCO2),thegreatertheequilibriumconcentrationofCO2intheliquidwillbe.Henry'sLawconstantsarechosenfromanextensivecollectionofestimatesfromacademicpapers(Sander1999).Estimatesaretypicallycalculatedinoneofthreeways:bytheoreticalcalculations,extrapolationsfromothermeasuredconstants,orbymeasurementsandexperiments.Foreachgas,Ichoosethemostrecentestimatefromaliteraturereviewwhereavailable.Ifalitera-turereview-basedestimateisnotavailable,IchoosethemodalHenry'sLawconstantreportedinSander(1999).Drydepositionismodeledthroughgravitationalsettlingandimpactionatgroundlevelwhichwithwindvelocity.Intheabsenceofprecipitationandchemicalreactions,drydeposi-tionistheprimarydeterminantofapollutant'slifetimeintheatmospherefollowingemissions.Iconductaliteraturesearchfordrydepositionvelocities,usingthecompoundnameandfideposi-tionvelocityflassearchterms.Fordepositionvelocitiesforgasesdrawnfromobservations,urban-settingdepositionvelocitiesarepreferred.Manygases,suchasNOandHCHO,donothavedrydepositionesoverland.Iuseadepositionvelocityof0ms1forsuchgaseswithtriviallanddepositionrates,andalsoforanygasesforwhichIamunabletoanydirectreferencetodepositionesorvelocities.ThedepositionvelocitiesIchoosearereportedinAppendixTable14.A.3NonlinearFirstStageTransformationTherelationshipbetweenmeasuredconcentrationsandmodeledconcentrationsisextremelynon-linear,requiringamonotonictransformationtomaximizethepredictivepowerofthepol-lutioninstrument.Figure6ashowstheestimatedcoefntsofapiecewiselinearregression69ofdailystationPM2.5onPM2.5interactedwithvigintile(5-percentile-block)indicator,representinganapproximationofthederivativeofthetruedoseresponsefunctionbetweenmeasuredandrawmodeledPM2.5acrosstherawmodeledPM2.5distribution.Thisregressioncontrolsforyear,month,andcountyedeffects,withstandarderrorsclusteredatthestatelevel.Thepatternishighlynonlinear,scalingmultipleordersofmagnitude,withtheestimatedslopemonotonicallydecreasinginconcentration.Afunctionoftheformf(x)=ax+c(witha>0;c0)followsacomparablepattern,suggestingthatalinearapproximationbetterpredictsstationPM2.5usingasaregressorthenaturallogarithmofmodeledPM2.5plussomeconstant.ThisnonlinearpatternimpliesthatsomecombinationoftheemissionscalculationsandHYSPLITisresultinginsystematicoverestimationoflargeconcentrationsandunderestimationofsmallcon-centrations.ThemonotonicallydecreasingslopeacrossthedomainofconcentrationsimplicatesthedispersioncalculationofHYSPLIT,whichreliesoncalibrationfromatmosphericparameterstodetermineturbulentvelocitiesandaGaussianrandomcomponentthatdeterminetherandom-walk-likedispersivebehavioroftheparticle.OneexplanationforthesubsequentlogarithmicisthatthecalibrationoftheGaussiancomponent'svariancedoesnotaccountforhowthetruevarianceisitselfpositivelyrelatedtoconcentrationlevel,resultinginsystematicunderestimationofdisper-sionforlargeconcentrationsandoverestimationforsmallconcentrations(causingoverestimatedandunderestimatedconcentrations,respectively).Alogarithmictransformationofthepollutionmeasureinthestageaccountsfortheimplicitoverdispersionofconcentrationsalongtrajectoriesbycompressingthedistributionofmagnitudes.Toaccomplishthistransformationwithoutdiscardingzerovalues,ItakethenaturallogarithmofdailyaveragePM2.5plusaconstant.Thechoiceofconstantbywhichtoshifttherawconcentrationbeforetakingthelogarithm,thefishiftparameterflhastwoimportantimpacts:itdeterminesthepositionofzeroesonthelogfunction,andrelatedly,itchangestherelativecurvatureoftheofloggedconcentrationstoobservedconcentrations.Shiftparametersthataretoosmallwillresultinthelogtransformationoverestimatingthecontrastbetweentheeffectofpositiveconcentrationsrelativetozeroconcentrations,whileshiftparameters70thataretoolargewillcauseanunderestimatedcontrast.Largeshiftparametersmayalsodistortthemarginaleffectsforlargervaluesinthedistribution.Onesensiblechoiceofshiftparameterisapointatwhichpositiveconcentrationscouldbeconsideredeffectivelyzeroforthedependentvariableofinterest.HYSPLIT'sconcentrationoutputsnearzerocanbereasonablyframedasasensitivityproblem:thereisacomputationalthresholdbelowwhichitwillnevergiveapositivevalue,andthedistributionofvaluesapproachingzeroiscontinuousuntilthetrivialminimumvalueat4:921034µgm3.Ichooseavaluecorrespondingtothe10thpercentileofpositivevalues(7:211014µgm3),addittotherawconcentrationvalue,andtakethelogarithm.Foreaseofinterpretationlater,Ialsoshiftalltransformedvaluesbytheminimumofthetransformedvaluestomakeallvaluesnonnegative.Figure6bshowsthesameregressionasinFigure6b,butnowwithloggeddailyPM2.5interactedwithvigintileindicator.Theslopesnowfallwithinthesameorderofmagnitude,slightlyincreasinginvigintile(implyingagradualshifttounderestimationofmarginalchangesinconcentrationsrelativetosmallervigintiles).Additionally,thereareseveralnumericallylargeoutlierswhichmayaffectthebutstationdataprovidesawayoftrimmingoutliervaluessensibly.Iaccountfortheseright-tailoutliersbyassigningtheinstrumentthestationPM2.5valueiftherawmodeledPM2.5exceedsthestation-observedPM2.5value,andbothvaluesaregreaterthan65µgm3.Empirically,thelatterconditionimpliestheformerin100percentofcases,whichmotivatedtheselectionofthiscutoff.Lessthan0:1percentofstation-dayshavemeasuredPM2.5exceeding65µgm3.AllotherPM2.5valuesexceeding65µgm3(approximately0:6percentofallvalues)aresetto65µgm3.Thisadjustmentcompressestherighttailofthedistribution,enhancingtheperformanceofthelogarithmictransformationItaketoimprovetheoftheinstrument(inexchangeforlosingsomevariationinextremevalues).Becausetheobservednonlinearrelationshipandoutliersareosten-siblyduetoHYSPLIT'sdispersioncalculationmethods,whicharenotuniquetoanypollutant,Iassumethatconcentrationsforotherpollutantspeciesfollowacomparablerelationshipwiththeirobservedvalues(intheabsenceofdailystationdatatodoaadjustment).Forallcontrolspecies,Ireduceallright-tailvaluesforotherpollutantspeciestotheir98thpercentileof71positivevaluesbeforetakingthelogarithmictransformation,sincethatistheapproximatepointatwhichthePM2.5valuesalwaysexceedstationvalues.Then,Itakethelogarithmoftheoutlier-adjustedmodeledconcentrationoutputsplusthe10thpercentileoftheirpositivevaluesadded.A.4EstimatesunderNon-ClassicalMeasurementErrorConsideracross-sectionalsetting,withhealthoutcomeyasafunctionoftrueexposurex,yi=xib+eiAssumexiisuncorrelatedwithei,andthattheresearcheronlyobservesanimperfectmeasurexofxsuchthatx=x+e.var(x)=s2x,var(e)=s2e,andcov(x;e)=sxe.Then,theprobabilitylimitoftheordinaryleastsquaresestimatorofyonxcanbewrittenasplim‹b=b(s2x+sxe)s2x+s2e+2sxeBytheCauchy-Schwarzinequality,thedenominatorisalwayspositive:itrepresentsthevari-anceoftheerror-proneregressorx.sxe=0correspondstotheclassicalerrors-in-variablesas-sumption,whichresultsinattenuationbias.TheprobabilitylimitoftheOLSestimate‹bisbothattenuatedandincorrectly-signedifsxe<0ands2x(1ŸR2)ŸR2(1ŸR2fe)ŸR2fe1.73REFERENCES74REFERENCES[1]Anderson,G.,Sandberg,D.,&Norheim,R.(2004).FireEmissionProductionSimulator(FEPS)User'sGuide.USDAForestServicePNorthwestResearchStation,FireandEnvironmentalResearchApplicationsTeam,Portland,OR.[2]Anderson,T.W.,&Rubin,H.(1949).EstimationoftheParametersofaSingleEquationinaCompleteSystemofStochasticEquations.TheAnnalsofMathematicalStatistics,20(1),46Œ63.doi:10.1214/aoms/1177730090[3]Andronache,C.(2003).Estimatedvariabilityofbelow-cloudaerosolremovalbyrainfallforobservedaerosolsizedistributions.Atmos.Chem.Phys.,3(1),131Œ143.doi:10.5194/acp-3-131-2003[4]Arceo-Gomez,E.O.,Hanna,R.,&Oliva,P.(2012).DoestheEffectofPollutiononInfantMortalityDifferBetweenDevelopingandDevelopedCountries?EvidencefromMexicoCity(WorkingPaperNo.18349).NationalBureauofEconomicResearch.Retrievedfromhttp://www.nber.org/papers/w18349[5]Banzhaf,H.S.,&Chupp,B.A.(2012).Fiscalfederalismandinterjurisdictionalexternal-ities:NewresultsandanapplicationtoUSAirpollution.JournalofPublicEconomics,96(5),449Œ464.[6]Barreca,A.I.(2012).Climatechange,humidity,andmortalityintheUnitedStates.JournalofEnvironmentalEconomicsandManagement,63(1),19Œ34.doi:10.1016/j.jeem.2011.07.004[7]Behrman,J.R.,&Rosenzweig,M.R.(2004).ReturnstoBirthweight.ReviewofEconomicsandStatistics,86(2),586Œ601.doi:10.1162/003465304323031139[8]Bell,M.L.,&HEIHealthReviewCommittee.(2012).Assessmentofthehealthimpactsofparticulatemattercharacteristics.ResearchReport(HealthEffectsInstitute),(161),5Œ38.[9]Bell,M.L.,McDermott,A.,Zeger,S.L.,Samet,J.M.,&Dominici,F.(2004).Ozoneandshort-termmortalityin95USurbancommunities,1987-2000.JAMA,292(19),2372Œ2378.doi:10.1001/jama.292.19.2372[10]Bound,J.,Jaeger,D.A.,&Baker,R.M.(1995).ProblemswithInstrumentalVariablesEstimationWhentheCorrelationBetweentheInstrumentsandtheEndogeneousExplana-toryVariableisWeak.JournaloftheAmericanStatisticalAssociation,90(430),443Œ450.doi:10.2307/2291055[11]Breton,C.,Park,C.,&Wu,J.(2011).EffectofPrenatalExposuretoWPM2.5onBirthWeight.Epidemiology,22,S66.doi:10.1097/01.ede.0000391864.79309.9c[12]Calvert,J.G.,Atkinson,R.,Becker,K.H.,Kamens,R.M.,Seinfeld,J.H.,Wallington,T.J.,&Yarwood,G.(2002).Themechanismsofatmosphericoxidationofaromatichydrocar-bons.OxfordUniversityPressNewYork.75[13]Chand,D.,Jaffe,D.,Prestbo,E.,Swartzendruber,P.C.,Hafner,W.,Weiss-Penzias,P.,...Kajii,Y.(2008).ReactiveandparticulatemercuryintheAsianmarineboundarylayer.AtmosphericEnvironment,42(34),7988Œ7996.doi:10.1016/j.atmosenv.2008.06.048[14]Chay,K.,Dobkin,C.,&Greenstone,M.(2003).TheCleanAirActof1970andAdultMortality.JournalofRiskandUncertainty,27(3),279Œ300.doi:10.1023/A:1025897327639[15]Chay,K.Y.,&Greenstone,M.(2003).AirQuality,InfantMortality,andtheCleanAirActof1970(WorkingPaperNo.10053).NationalBureauofEconomicResearch.Retrievedfromhttp://www.nber.org/papers/w10053[16]Chen,B.,Stein,A.F.,Maldonado,P.G.,SanchezdelaCampa,A.M.,Gonzalez-Castanedo,Y.,Castell,N.,&delaRosa,J.D.(2013).Sizedistributionandconcentrationsofheavymet-alsinatmosphericaerosolsoriginatingfromindustrialemissionsaspredictedbytheHYS-PLITmodel.AtmosphericEnvironment,71,234Œ244.doi:10.1016/j.atmosenv.2013.02.013[17]Chen,L.H.,Knutsen,S.F.,Shavlik,D.,Beeson,W.L.,Petersen,F.,Ghamsary,M.,&Abbey,D.(2005).TheAssociationbetweenFatalCoronaryHeartDiseaseandAmbientParticulateAirPollution:AreFemalesatGreaterRisk?EnvironmentalHealthPerspectives,113(12),1723.[18]Collins,J.,Williams,A.,Paxton,C.,&Davis,R.(2009).Geographical,Meteorological,andClimatologicalConditionsSurroundingthe2008Interstate-4DisasterinFlorida.PapersoftheAppliedGeographyConferences,153Œ162.[19]Currie,J.,&Neidell,M.(2005).AirPollutionandInfantHealth:WhatCanWeLearnfromCalifornia'sRecentExperience?TheQuarterlyJournalofEconomics,120(3),1003Œ1030.[20]Currie,J.,Zivin,J.S.G.,Mullins,J.,&Neidell,M.J.(2013).WhatDoWeKnowAboutShortandLongTermEffectsofEarlyLifeExposuretoPollution?(WorkingPaperNo.19571).NationalBureauofEconomicResearch.Retrievedfromhttp://www.nber.org/papers/w19571[21]Daniels,M.J.,Dominici,F.,Samet,J.M.,&Zeger,S.L.(2000).EstimatingParticulateMatter-MortalityDose-ResponseCurvesandThresholdLevels:AnAnalysisofDailyTime-Seriesforthe20LargestUSCities.AmericanJournalofEpidemiology,152(5),397Œ406.doi:10.1093/aje/152.5.397[22]Dieterle,S.,&Snell,A.(2013).ExploitingNonlinearitiesintheFirstStageRegressionsofIVProcedures.[23]Donovan,G.H.(2006).Determiningtheoptimalmixoffederalandcontractcrews:AcasestudyfromthePNorthwest.EcologicalModelling,194(4),372Œ378.[24]Draxler,R.,Arnold,D.,Chino,M.,Galmarini,S.,Hort,M.,Jones,A.,...Wotawa,G.(n.d.).WorldMeteorologicalOrganization'smodelsimulationsoftheradionuclidedisper-sionanddepositionfromtheFukushimaDaiichinuclearpowerplantaccident.JournalofEnvironmentalRadioactivity.doi:10.1016/j.jenvrad.2013.09.01476[25]Draxler,R.R.,&Hess,G.(1997).DescriptionoftheHYSPLIT4modelingsystem.[26]Dye,J.A.,Lehmann,J.R.,McGee,J.K.,Winsett,D.W.,Ledbetter,A.D.,Everitt,J.I.,...Costa,D.L.(2001).Acutepulmonarytoxicityofparticulatematterextractsinrats:coherencewithepidemiologicstudiesinUtahValleyresidents.EnvironmentalHealthPerspectives,109(Suppl3),395Œ403.[27]Escudero,M.,Stein,A.,Draxler,R.R.,Querol,X.,Alastuey,A.,Castillo,S.,&Avila,A.(2006).DeterminationofthecontributionofnorthernAfricadustsourceareastoPM10con-centrationsoverthecentralIberianPeninsulausingtheHybridSingle-ParticleLagrangianIntegratedTrajectorymodel(HYSPLIT)model.JournalofGeophysicalResearch:Atmo-spheres,111(D6),D06210.doi:10.1029/2005JD006395[28]Federalhistoryreportsbydateandorganization:1980-2013DOI(BIA,BLM,BOR,NPS),USFWS,andUSFS.(2013).U.S.DepartmentofInterior.Retrievedfrom.usgs.go[29]Finlay,K.,&Magnusson,L.M.(2009).Implementingweak-instrumentrobusttestsforageneralclassofinstrumental-variablesmodels.StataJournal,9(3),398Œ421.[30]FireExecutiveCouncil.(2009).Guidanceforimplementationoffederalwildlandman-agementpolicy.[31]Franklin,M.,Koutrakis,P.,&Schwartz,J.(2008).TheRoleofParticleCompositionontheAssociationBetweenPM2.5andMortality.Epidemiology(Cambridge,Mass.),19(5),680Œ689.[32]Hahn,J.,&Hausman,J.(2002).Notesonbiasinestimatorsforsimultaneousequationmodels.EconomicsLetters,75(2),237Œ241.doi:10.1016/S0165-1765(01)00602-4[33]Heutel,G.,&Ruhm,C.J.(2013).AirPollutionandProcyclicalMortality(Work-ingPaperNo.18959).NationalBureauofEconomicResearch.Retrievedfromhttp://www.nber.org/papers/w18959[34]Huang,Y.-C.T.,&Ghio,A.J.(2006).Vasculareffectsofambientpollutantparticlesandmetals.CurrentVascularPharmacology,4(3),199Œ203.[35]Jayachandran,S.(2009).AirQualityandEarly-LifeMortalityEvidencefromIndonesia'sWJournalofHumanResources,44(4),916Œ954.[36]Keane,R.E.,&Karau,E.(2010).Evaluatingtheecologicalofbyintegrat-ingandecosystemsimulationmodels.EcologicalModelling,221(8),1162Œ1172.[37]Knapp,E.E.,Estes,B.L.,&Skinner,C.N.(2009).EcologicalEffectsofPre-scribedFireSeason:ALiteratureReviewandSynthesisforManagers.Retrievedfromhttp://wwwv/projects/07-S-08/project/07-S-08_psw_gtr224-1.pdf[38]Knittel,C.R.,Miller,D.L.,&Sanders,N.J.(2011).Caution,Drivers!ChildrenPresent:TrafPollution,andInfantHealth(WorkingPaperNo.17222).NationalBureauofEco-nomicResearch.Retrievedfromhttp://www.nber.org/papers/w1722277[39]Kristensen,L.J.,&Taylor,M.P.(2012).FieldsandForestsinFlames:LeadandMercuryEmissionsfromWPyrogenicActivity.EnvironmentalHealthPerspectives,120(2),a56Œa57.doi:10.1289/ehp.1104672[40]Künzli,N.,Jerrett,M.,Mack,W.J.,Beckerman,B.,LaBree,L.,Gilliland,F.,...Hodis,H.N.(2005).AmbientairpollutionandatherosclerosisinLosAngeles.EnvironmentalHealthPerspectives,201Œ206.[41]Larkin,N.K.,O'Neill,S.M.,Solomon,R.,Raffuse,S.,Strand,T.,Sullivan,D.C.,...Ferguson,S.A.(2009).TheBlueSkysmokemodelingframework.Int.J.WildlandFire,18(8),906Œ920.[42]MacNee,W.,&Donaldson,K.(2003).MechanismoflunginjurycausedbyPM10andparticleswithspecialreferencetoCOPD.EuropeanRespiratoryJournal,21(40suppl),47sŒ51s.doi:10.1183/09031936.03.00403203[43]Moretti,E.,&Neidell,M.(2011).Pollution,Health,andAvoidanceBehavior:EvidencefromthePortsofLosAngeles.JournalofHumanResources,46(1),154Œ175.[44]Murray,M.P.(2006).AvoidingInvalidInstrumentsandCopingwithWeakInstruments.JournalofEconomicPerspectives,20(4),111Œ132.doi:10.1257/jep.20.4.111[45]O'Neill,S.M.,Larkin,N.(Sim)K.,Hoadley,J.,Mills,G.,Vaughan,J.K.,Draxler,R.R.,...Ferguson,S.A.(n.d.).Regionalreal-timesmokepredictionsystems,8,499Œ534.[46]Ottmar,R.D.,Miranda,A.I.,&Sandberg,D.V.(n.d.).Characterizingsourcesofemissionsfromwildland8,61Œ78.[47]Ottmar,R.D.,Sandberg,D.V.,Riccardi,C.L.,&Prichard,S.J.(2007).AnoverviewoftheFuelCharacteristicSystemŠQuantifying,classifying,andcreatingfuelbedsforresourceplanning.CanadianJournalofForestResearch,37(12),2383Œ2393.doi:10.1139/X07-077[48]Pope,C.A.,Burnett,R.T.,Thurston,G.D.,Thun,M.J.,Calle,E.E.,Krewski,D.,&Godleski,J.J.(2004).CardiovascularMortalityandLong-TermExposuretoParticulateAirPollutionEpidemiologicalEvidenceofGeneralPathophysiologicalPathwaysofDisease.Circulation,109(1),71Œ77.doi:10.1161/01.CIR.0000108927.80044.7F[49]PopeIII,C.A.,Rodermund,D.L.,&Gee,M.M.(2007).Mortalityeffectsofacoppersmelterstrikeandreducedambientsulfateparticulatematterairpollution.EnvironmentalHealthPerspectives,679Œ683.[50]Pozzi,R.,DeBerardis,B.,Paoletti,L.,&Guastadisegni,C.(2003).mediatorsinducedbycoarse(PM2.5Œ10)and(PM2.5)urbanairparticlesinRAW264.7cells.Toxicology,183(1Œ3),243Œ254.doi:10.1016/S0300-483X(02)00545-0[51]Prichard,S.,Ottmar,R.,&Anderson,G.(2006).Consume3.0User'sGuide.USDAForestService.PNorthwestResearchStation.(Seattle,WA)AvailableatHttp://www.Fs.Fed.Us/pnw/fera/research/smoke/consume/index.Shtml[V6January2012].78[52]Prinn,R.,Cunnold,D.,Rasmussen,R.,Simmonds,P.,Alyea,F.,Crawford,A.,...Rosen,R.(1987).Atmospherictrendsinmethylchloroformandtheglobalaverageforthehydroxylradical.Science(NewYork,N.Y.),238(4829),945Œ950.doi:10.1126/science.238.4829.945[53]Rappold,A.G.,Cascio,W.E.,Kilaru,V.J.,Stone,S.L.,Neas,L.M.,Devlin,R.B.,&Diaz-Sanchez,D.(2012).Cardio-respiratoryoutcomesassociatedwithexposuretosmokearebymeasuresofcommunityhealth.EnvironmentalHealth,11(1),71.doi:10.1186/1476-069X-11-71[54]Rolph,G.D.,Draxler,R.R.,Stein,A.F.,Taylor,A.,Ruminski,M.G.,Kondra-gunta,S.,...Davidson,P.M.(2009).DescriptionandVoftheNOAASmokeForecastingSystem:The2007FireSeason.WeatherandForecasting,24(2),361Œ378.doi:10.1175/2008WAF2222165.1[55]Samet,J.M.,Dominici,F.,Curriero,F.C.,Coursac,I.,&Zeger,S.L.(2000).FinePar-ticulateAirPollutionandMortalityin20U.S.Cities,1987Œ1994.NewEnglandJournalofMedicine,343(24),1742Œ1749.doi:10.1056/NEJM200012143432401[56]Sanders,N.J.,&Stoecker,C.F.(2011).WhereHaveAlltheYoungMenGone?UsingGenderRatiostoMeasureFetalDeathRates(WorkingPaperNo.17434).NationalBureauofEconomicResearch.Retrievedfromhttp://www.nber.org/papers/w17434[57]Schlenker,W.,&Walker,W.R.(2011).Airports,AirPollution,andContemporaneousHealth.NationalBureauofEconomicResearchWorkingPaperSeries,No.17684.Retrievedfromhttp://www.nber.org/papers/w17684[58]Schwartz,J.,Laden,F.,&Zanobetti,A.(2002).Theconcentration-responserelationbe-tweenPM(2.5)anddailydeaths.EnvironmentalHealthPerspectives,110(10),1025.[59]Short,K.C.(2013).AspatialdatabaseofintheUnitedStates,1992Œ2011.EarthSystemScienceDataDiscussions,6(2),297Œ366.doi:10.5194/essdd-6-297-2013[60]Slama,R.,Darrow,L.,Parker,J.,Woodruff,T.J.,Strickland,M.,Nieuwenhuijsen,M.,...Ritz,B.(2008).MeetingReport:AtmosphericPollutionandHumanReproduction.Envi-ronmentalHealthPerspectives,116(6),791Œ798.doi:10.1289/ehp.11074[61]Sorensen,M.,Daneshvar,B.,Hansen,M.,Dragsted,L.O.,Hertel,O.,Knudsen,L.,&Loft,S.(2003).PersonalPM2.5exposureandmarkersofoxidativestressinblood.EnvironmentalHealthPerspectives,111(2),161Œ166.[62]Staiger,D.,&Stock,J.H.(1994).InstrumentalVariablesRegressionwithWeakInstru-ments(WorkingPaperNo.151).NationalBureauofEconomicResearch.Retrievedfromhttp://www.nber.org/papers/t0151[63]Stieb,D.M.,Burnett,R.T.,Smith-Doiron,M.,Brion,O.,Shin,H.H.,&Economou,V.(2008).Anewmultipollutant,no-thresholdairqualityhealthindexbasedonshort-termassociationsobservedindailytime-seriesanalyses.JournaloftheAir&WasteManagementAssociation,58(3),435Œ450.79[64]TAN,W.C.,QIU,D.,LIAM,B.L.,NG,T.P.,LEE,S.H.,vanEEDEN,S.F.,...HOGG,J.C.(2000).TheHumanBoneMarrowResponsetoAcuteAirPollutionCausedbyForestFires.AmericanJournalofRespiratoryandCriticalCareMedicine,161(4),1213Œ1217.doi:10.1164/ajrccm.161.4.9904084[65]Wen,D.,Lin,J.C.,Zhang,L.,Vet,R.,&Moran,M.D.(2013).ModelingatmosphericammoniaandammoniumusingastochasticLagrangianairqualitymodel(STILT-Chemv0.7).Geosci.ModelDev.,6(2),327Œ344.doi:10.5194/gmd-6-327-2013[66]Wiedinmyer,C.,&Friedli,H.(2007).MercuryEmissionEstimatesfromFires:AnInitialInventoryfortheUnitedStates.EnvironmentalScience&Technology,41(23),8092Œ8098.doi:10.1021/es071289o[67]Wooldridge,J.M.(2010).EconometricAnalysisofCrossSectionandPanelData.MITPress.[68]Yoder,J.(2004).Playingwithendogenousriskinresourcemanagement.AmericanJournalofAgriculturalEconomics,86(4),933Œ948.[69]Zelikoff,J.T.,Chen,L.C.,Cohen,M.D.,Fang,K.,Gordon,T.,Li,Y.,...Schlesinger,R.B.(2003).EffectsofInhaledAmbientParticulateMatteronPulmonaryAntimicrobialImmuneDefense.InhalationToxicology,15(2),131Œ150.doi:10.1080/0895837030447880Chapter2.FiniteSamplePropertiesandEmpiricalApplicabilityofTwo-SampleTwo-StageLeastSquares(WithWeiLin)1IntroductionInstrumentalvariables(IV)methodsenabletheconsistentestimationofendogenousvariables'causaleffectsbutsufferfrompoorpropertiesanddataavailabilityconstraints.Bound,Jaeger,andBaker(1995)establishthatestimationwithweakinstrumentscanleadtolargeincon-sistenciesandsamplebias.IVestimatesalsotendtohaverelativelylargestandarderrors,ofteninhibitingtheinterpretabilityofdifferencesbetweenIVandnon-IVpointestimates.Lastly,theidiosyncraticnatureofvalidinstrumentalvariablesreducestheiravailabilityindatasetsalong-sideoutcomeandothervariablesofinterest.BeginningwithKlevmarken(1982),someresearchershavesoughttoaddresstheproblemofdataavailabilitybyusingtwo-sampleIVmethods(TSIVM),whichcombineparameterestimatesfrommultipledatasetsintoaIVestimate.Underasetofidealconditions,aTSIVMproducesanestimatewithidenticalbiastotheotherwiseinaccessibletraditionalIVestimate.However,thepropertiesofTSIVMestimatorsaregenerallyunknown,andpriorliteraturelacksclearguidelinesforhowresearchersshouldinterpretthem.ThepotentialforresearcherstointroduceadditionaldataandestimatemodelsbyTSIVMtoproduceestimatessuperiortoavailablesingle-sampleestimateshasalsonotbeenexplored.Weestablishsomeinsightsintothepropertiesofthetwo-sampletwo-stageleastsquares(TS2SLS)estimator.Likelyowingtoitseaseofimplementationandinterpretation,theTS2SLSestimatoristhemostcommonly-usedTSIVMestimatorinempiricalapplications(e.g.,ArellanoandMeghir1992;denBergetal.2015;DevereuxandHart2014;NicolettiandErmisch2014;RothsteinandWozny2014).WebroadenthesetofpotentialapplicationsofTS2SLSby81demonstratingthatevenwhereaone-sample2SLSestimateisavailable,aTS2SLSestimatormaysometimesbepreferredorworthreportingalongsidetheone-sampleestimatebecauseofgreaterprecisionandsmallerbias.WeproposeapproximationsforthebiasandvarianceoftheTS2SLSestimatorthataredependentonboththetypicalsetofparametersinthefiweakinstrumentsflliter-atureforone-sampleIVestimatorsandonthreeparametersuniquetotwo-sampleestimators:thedistinctsamplesizesofthegeandsecond-stagesamplesandtheproportionofobservationsthatfioverlapflbetweenthem(i.e.,thefractionofrealpopulationunitsfromthesamplewhicharealsointhesecond-stagesample).Totesttheapproximations,weconductaseriesofMonteCarlosimulationsandcomputetheaveragebiasandstandarderrorsacrosssimulations.WedevelopadataframeworkinwhichtheTS2SLSestimatoriscomputedfromacompletetheoreticalsampleofpopulationunits(thefisuper-samplefl)ofwhichtheandsecond-stagesamplesusedtocomputetheTS2SLSestimateofinterestaresubsetscontainingpotentiallyoverlappingunits.Thisapproachformallyreconcilestwo-sampleandsplit-sample2SLSestima-torsbynestingtheminacommonframework;forexample,itmakesequivalenttheno-overlapTS2SLSestimatorstudiedinInoueandSolon(2008)andsplit-sample2SLS(SS2SLS)estima-toranalogoustoAngristandKrueger(1995)'ssplit-sampleIV(SSIV).WethattheTS2SLSestimatorcanbewrittenasaconvexlinearcombinationofthe2SLSestimatorcomputedusingtheoverlappingunitsbetweenthetwosamples,andtheSS2SLSestimatorcomputedusingtheremainingnon-overlappingunits.Theweightonthe2SLScomponentisafunctionofsamplingvariationintheestimatesforeachoftheoverlappingandnon-overlappingsubsamples.Theweightconvergesasymptoticallytothefioverlapflparameter,representingtheproportionofunitsinthesecond-stagesamplewhicharealsointhesample.ThislinearpartitioningoftheTS2SLSestimatorframesitasafunctionoftwoestimatorswithknownproperties,simpli-fyingthedevelopmentofapproximationsofbiasandvariance.WethattheTS2SLSestimatorhasallbiasdependentonthedegreeofsamplingerrorintheparameters(i.e.,thecoefontheinstruments)relativetothestrengthoftheendogeneity.Aswith2SLS,thebiasisonlydecreasingforthebiasfrom82samplingerror.BiasesfrominvalidinstrumentsorfromviolationsofthekeyTS2SLSassumption(thatthecoefontheinstrumentsareidenticalforbothprimaryandsecondarysamples)areinvarianttosamplesize.TheTS2SLSestimator'svarianceisdecreasingineachofitscorrespondingsamplesizes,withvariancereductiondiminishingmorerapidlyfromincreasingsamplesize.WedemonstrateahypotheticalempiricalapplicationofTS2SLSusingdatafromAngristandEvans(1998),usingtheiractualsampleasthefisuper-sampleflandexaminingtheestimatestheycouldhaverecoveredhadtheybeenforcedtouseTS2SLSwithsubsetsoftheirsampleinsteadof2SLSwiththeirentiresample.UsingTS2SLSwithhalfthedataforthestageandhalfforthesecondstage(effectivelyaSS2SLSestimate),theestimateisclosertozerothanthesuper-sample2SLSestimateandlessprecise.WethataTS2SLSestimateusingonlyhalfoftheirobservationsfortheestimationbuttheirentiresampleforthesecond-stagealmostex-actlyrecoversthesuper-sample2SLSestimatewithequivalentprecision,providingevidenceofthestrengthoftheinstrumentused.ThisexercisesuggestsonesituationinpracticeinwhichTS2SLSismostlikelytoyieldahighreturn:whentheresearchercanestimatethestagepreciselywithonesetofdata,butalsohasaccesstomoreobservationscontainingtheoutcomeandtheinstrumentbutnottheendogenousvariable.TheeconometricsliteratureformallyconcerningTSIVMhasexploredthecomputationandasymptoticpropertiesofvarioustwo-sampleIVestimators.AngristandKrueger(1995)providethepropertiesofthesplit-sampleIVestimator,onlyalludingtoitsrelationshiptothetwo-sampleIVestimator,whichtheyuseinanotherstudy(AngristandKrueger1993).InoueandSolon(2008)computationallydistinguishthetwo-sampleIVestimator,whichiscalculatedexplicitlyusingtheratioofcovariancematriceseachestimatedfromdifferentdatasets,fromtheTS2SLSestimator,whichiscalculatedusingordinaryleastsquaresoftheoutcomeagainstcross-samplevalues.TheirmainisthattheTS2SLSapproachisasymptoticallymoreefientthantheTSIVapproachbecauseTS2SLStakesintoaccountdifferencesinthesam-plingdistributionoftheinstrumentbetweentheprimaryandsecondarysamples,whileTSIVdoes83not.BothAngristandKrueger(1995)andInoueandSolon(2008)onlyconsidertwo-samplees-timatorswhichusefullyindependentsamples.Onepaperintheepidemiologyliterature,PierceandBurgess(2013),makessomecommentaryonthepropertiesoftwo-sampleIVestimatorsviasimulation,thoughthepaperisprimarilyfocusedontheuseofTSIVmethodstopromoteefstudydesignbyreducingthenumberofobservationsneeded.ThispapercontributestotheTSIVMliteraturebyformalizingresultsaroundthebehavioroftheTS2SLSestimator,generalizingtheresultstopotentiallynon-independentandsecond-stagesamples,andapplyingtheresultstocommonapplicationsinwhichTS2SLS-styleestimatorsareused.2PropertiesoftheTS2SLSEstimator2.1ModelToexaminetheTS2SLSestimator,weestablishaconventionallinearsimultaneousequationsframeworkwhilealsotherelationshipsbetweentwoarbitrarysamples.SupposeS1f(y1i;z1i)gN1i=1andS2x2j;z2jN2j=1arei.i.d.randomvectorsfromthesameunderlyingpop-ulation,wherez01iandz02jareK1vectorsandy1iandx2jarescalars.S1correspondstowhatwecallthefisecond-stagesample,flandS2correspondstothesample.flWeassumethefollowingonzandx:1.E(z01iz1i)=E(z02jz2j)=Wz:2.E(z01ix1i)=E(z02jx2j)=Wxz:3.RankEz01iz1i=RankEz02jz2j=K4.RankEz01ix1i=RankEz02jx2j=1Wefollowageneralsingle-endogenousvariableframework:84x1i=z1ig1+v1i(8)x2j=z2jg2+v2j:(9)y1i=x1ib+ei(10)=z1igb+bv1i+ei(11)=z1igb+u1i(12)Withoutlossofgenerality,predeterminedvariablesw(includingconstants)arefipartialledoutflofsourcevariablestocreatex,z,andy.Theoutcomeyisafunctionofanendogenousvariablex,andxisafunctionofaninstrumentzanderrorv.Theerroremayingeneralbecorrelatedwitherrorv,givingrisetotheendogeneityofx.Thedatasetsavailableforuseinestimationconsistoftwosubsamples,S1andS2ofabroaderdatasetofNunits.S1andS2generallymaypartiallyorentirelyoverlapintermsoftheunderlyingunitsforwhichtheyhavedata.Thesubscriptthesamplefromwhicheachobservationsofx,y,orzisdrawn.Thesecondsubscripts,iforsample1andjforsample2,indexindividualswithineachsample.bisthecausaleffectofxony,andtheparameterofinterestestimatedbyinstrumentalvariables.g1andg2arethelinearprojectioncoefofxonzineachsample,andcoulddifferinpractice;here,weassumethatg1=g2.Theassumptionsforthismodelareasfollows:Assumptions:1.afunctionF(i)whichtakesonvalue1ifunitiinS1isalsoamemberofS2,andzerootherwise.2.ThenumberofunitsinN2alsoinN1isrN2,with0r1.(a)ThetotalnumberofdistinctunitsrepresentedbysubsamplesS1andS2isN=N1+(1r)N2.853.ThedataareorderedsothatsequenceoftherN2numberofobservationsfromS2areidenticaltothesequenceoftherN2numberofobservationsinS1x2j;z2jrN2j=1=f(x1i;z1i)grN2i=1:4.z1iarevalidandrelevantexcludedinstrumentsforx;thatis,E(eijz1i)=E(eijz2i)=0;E(v1ijz1i)=0;E(v2jjz2j)=0;g6=05.TheratioofS1andS2convergestoaedpositivenumberinlargesamples,thatis,plimN1;N2!¥N1N2=a:6.Equation(8)isthestructuralequationwithstructuralerrorei.Equation(12)isthefireduced-formflequation,fromsubstituting(8)into(10)andcompositeerroru1i=bv1i+ei.Theerrortermsarehomoscedastic(implicitlyconditionedonthepartialledoutexogenousvariablesw)withcovariancematrix,var0BBBB@eiv1iv2j1CCCCA=0BBBB@s2esevrsevsevs2vrs2vrsevrs2vs2v1CCCCA:Notethateiandv2jareindependentwhenF(i)=0,butwhenF(i)=1,v2j=v1j.Thiscovariancematriximplies(a)var(u1i)=b2s2v+s2e+2bsev.s2uvar(u1i).(b)ei=qv1i+r1i,whereq=sevs2v,andr1iisindependentofv1i.(c)Ex2jei=rsev:Thusfar,wehavebeenagnosticregardinganypracticalcaseofdatacombination-thismodelwillgenerallyapplytoanycombinationoftwosamplestoestimateb.Themodelhowtwo86samplesaredrawnfromthesameunderlyingpopulationandmaysharesomeunits,describingtheresultingcovariancestructureoftheerrors.Forsimplicityofargument,wehaveassumedzeroconditionalmean(i.e.,validinstrumentsz)andhomoscedasticityoftheerrors(implicitlyconditionedonpredeterminedvariablesw).Thebiasandvarianceapproximationspresentedinthispaperareintendedfordirectionalinsightregardingtherelationshipbetweentheandsecond-stagesamples,ratherthanexplicitestimation.Invalidinstrumentswillchangethesamplebiasasafunctionofthemagnitudeanddirectionofcovariancebetweenzande;weconjecturethatinvalidinstrumentswouldnotaffectthenatureofsamplebiasarisingstrictlyfromstagesamplingerror.However,heteroscedasticitywouldlikelyincreasethesamplebiasandvarianceoftheTS2SLSestimatorrelativetoamodelunderhomoscedasticitywiththesameparametersrandsev.HeteroscedasticerrorinthestageresultsinanOLSstageestimatesnolongerbeingminimum-varianceandvariancedrivesbothbiasandvarianceinthe2SLSestimate.Practitionerswillstillneedtocomputeaccuratestandarderrorsontheirestimates,andthusvarianceestimatesshouldbemaderobusttoarbitraryheteroscedasticity.2.2ofEstimatorsInpractice,theTS2SLSestimatorinvolvesgeneratinganestimateofthestageparameterg,‹g2,usingN2observationswithnonmissingvaluesofxandz,generatingN1cross-sampledvalues‹x1i=z1i‹g2,andthenregressingy1on‹x1viaOLStoestimateb.Tofacilitateaclearerun-derstandingoftheestimator,weexpressTS2SLSasaweightedcombinationof2SLSandSS2SLSestimatorsondifferentsubsetsofthedata,whosesizesdependonthedegreeofoverlapbetweensamplesS1andS2.Intuitively,the2SLScomponentoftheestimatorisestimatedusingallunitswhicharesharedbetweenS1andS2;theSS2SLScomponentisestimatedusingallunitswhichlieexclusivelywithinS1orS2.Accordingly,theTS2SLSestimatorisequivalentto2SLSforr=1andSS2SLSforr=0whenN1=N2.Notethattheseindividualestimatesmayrelyondatathatisnotobservedinthepracticalsettinginwhichweareconsideringestimators;theexpressionofaweightedaverageofestimatorsprimarilyservesthepurposeofprovidingamoreinterpretableand87algebraicallyconvenientstartingpointforderivingtheestimator'sproperties.LetY10BBBB@y11...y1N11CCCCA0B@Y11Y121CA,bX10BBBB@bx11...bx1N11CCCCA0BBBB@bz11‹g2...bz1N1‹g21CCCCA0B@bX11bX121CA,Z10BBBB@z11...z1N11CCCCA0B@Z11Z121CA,X10BBBB@e1...eN11CCCCA0B@X11X121CAandV10BBBB@v11...v1N11CCCCA0B@V11V121CAbethevectorsinS1,wherevectorsY11,bX11,Z11,X11,V11aretherN2rows,andvectorsY12,bX12,Z12,X12,V12aretheremainingN1rN2rows.LetX10BBBB@x11...x1N11CCCCA0B@X11X121CA,whereweobserveX11butnotX12.Similarly,letX20BBBB@x21...x2N21CCCCA0B@X11X221CA,Z20BBBB@z21...z2N21CCCCA0B@Z11Z221CAandV20BBBB@v21...v2N21CCCCA0B@V11V221CAbethevectorsinS2,wherevectorsX11,Z11arethesameastherstrN2rowsinS1,andvectorsX22andZ22aretheremaining(1r)N2rowsinS2.TheTS2SLSestimatorisbybbO=bX01bX11bX01Y1:Proposition1.Bytheproofintheappendix,bbOcanberewrittenasbbO=bX01bX11bX011bX11bX011bX111bX011Y11+bX01bX11bX012bX12bX012bX121bX012Y12=bWbb(1)2SLS+1bWbb(2)SS2SLS(13)88where‹WbX01bX11bX011bX11;bb(1)2SLS=bX011bX111bX011Y11;andbb(2)SS2SLS=bX012bX121bX012Y12:Proposition2.Theprobabilitylimitof‹Wasbothsamplesapproachistheratiooftheoverlapparametertotheasymptoticratioofsamplesizes.plimN1;N2!¥bWW=ra:Remark3.Theexpectedvalueof‹Wcanbeapproximatedasfollows(seeappendix):EbWtrN2g0Wzg+Ks2vN1g0Wzg+Ks2v+(N1rN2)=(1r)N2s2v:Proposition1formalizesthepartitionofbbOintoaweightedaverageofbb(1)2SLSandbb(2)SS2SLS.Theweightsrepresentthesumofsquaresusedbyeachestimatorrelativetothetotalsumofsquaresintheentirevectorofdata.Thisvariationhastwocomponents:variationexplicitlyfrominstrumentsZ,andvariationfromstageerrorvectorsVwhichmanifeststhroughtheestimatedtstagecoefAsymptotically,theweightsareafunctionofthedegreeofoverlapbetweensamplesandtheratioofsamplesizes.Fora=1,theweightsaretheratiooftheoverlappingsamplesizetothesecond-stagesamplesize.892.3First-orderbiasapproximationThereisanexpansiveliteratureonthee-samplebiasofIVestimators,withpaperssuchasNagar(1959),Bekker(1994),StaigerandStock(1997),andBunandWindmeijer(2010)offeringapproximationsofvariousforms.Forsimplicity,weconsiderapproximationsofthebiasof2SLSandSS2SLSanddevelopabiasapproximationforTS2SLS.HahnandHausman(2002)offerasimpleapproximationofthebiasof2SLS,usingtheproductoftheinverseexpectedvalueofthevarianceofvaluestotheexpectedvalueofthecovariancebetweenthevaluesandtheoutcome.Inascalarcase,thisisineffectusingtheratioofexpectationsinplaceoftheexpectedvalueoftheratio.ThisapproximationcorrespondstoaTaylorSeriesexpansionofthe2SLSestimatorabouteachofthesefinumeratorflandfidenominatorflterms.For2SLS,thisapproximationsufcharacterizesthedirectionalresponsesofbiastosamplesize,numberofinstruments,errorcovariance,andinstrumentstrength.ThesameholdsforSS2SLS,whichhasacomparableshapeofresponsebuthasbiascharacterizedentirelybysamplingerror.Proposition4.Thebiasforthe2SLSestimatorontheoverlappingportionofthesampleisapproximatedbyast-orderTaylorexpansion:E(bb(1)2SLSb)ˇKsevrN2g01Wzg1+Ks2v:(14)TheapproximatebiasfortheSS2SLSestimatoronthenon-overlapportionofthesampleisgivenbyE(bb(2)SS2SLSb)ˇs2vb=(1r)N2g0Wzg+s2v=(1r)N2=b(1r)N2g0Wzg=s2v+1(15)Equation(14)andequation(15)showthatthebiasofboththe2SLSestimatorandSS2SLSestimatorapproachzeroasN2getslarge:theyareasymptoticallyunbiasedinstagesample90size.Thisconformswiththeintuitionthatallnite-samplebiasin2SLS(undervalidinstruments)originatesfromsamplingerror.Boththedirectionandmagnitudeofbiasinthe2SLSestimatordependsonsev,thecovarianceoferrortermsrepresentingtheendogenousportionofx.SS2SLShasaattenuationbiasthatisinverselyproportionalto(1r)N2g0Wzg=s2v,theficoncentrationparameter,flintuitivelysimilartoabiasfrommeasurementerror.Notethatthisattenuation,foundsimilarlyinAngristandKrueger(1995),isdependentontheassumptionofalinearconditionalexpectationfunctioninthestage.AnapplicationofJensen'sinequalitytotheSS2SLSestimatorsuggeststhattheattenuationbiasmaynotgenerallyhold(seeappendix).Proposition5.For01andb>KqN2N1,settheoverlapproportiontor=bN1N2KqbKq2(0;1)Proposition5followsfromaTaylorseriesapproximationofE(bbO),derivedintheappendix.Thesecondpartisshownbysettingthenumeratorinequation(16)tozeroandsolvingforr.ThenetbiasofTS2SLSisdependentontheoverlapparameterrandsamplingvariationinthestageparametersestimatedfromtheunitsusedinTS2SLSestimationexpressedthroughtheratio‹W.Whenthecombinationofbandsevmakesthedirectionofbiasforthetwoestimators91havedifferentsigns,theoverlapparameterrcanbetunedtoyieldanapproximatelyunbiasedestimatorbbO.Becauseofthenatureoftheapproximationused,theapproximationperformspoorlynearr=1andisatr=1.WithN16=N2andr=1,theexpressionevaluatestobaftermultiplyingthenumeratoranddenominatorby(1r),suggestingthattheapproximationdoesnotcaptureusefulpropertiesofthisimportantedgecaseofTS2SLS.WithN1=N2andr=1,theexpressionisbuthasalimitasrapproachesone:limr!1;N1=N2Ksev((N1rN2)=(1r)N2)s2vbN1g0Wzg+Ks2v+((N1rN2)=(1r)N2)s2v=Ksevs2vbN1g0Wzg+Ks2v+s2v:Thisedgecaseisalikelyinaccurateapproximation,arisingasanartifactofthe-orderTaylorseriesapproximation.Wecanintuitthatthetotaloverlap(r=1)casewithN1=N2resultsinanestimatecomputationallyidenticalto2SLS;thetwo-stepnatureof2SLSmeansthattheabilitytoexplicitlylinktheunitsusedinthestepshasnobearingontheestimateiftheunitsusedinthestepsareindeedthesame.Thisapproximationcontainsallelementsofthebiasapproximationfor2SLSpresentedinequation(14)buthasfiartifactsflfromthemixtureof2SLSandSS2SLSestimators.Incidentally,withN1=N2,weintuitthattheno-overlap(r=0)casegeneratesacomputationallyidenticalestimatetothethefisplit-sampleflestimatorthatisbiasedtowardzero,aresultthatwouldbeconsistentwithAngristandKrueger(1995).2.4AsymptoticVarianceofTS2SLSProposition6.bbOisconsistentandasymptoticallynormallydistributedwithasymptoticvariancepN1+(1r)N2bbOba˘N8<:0;(1+ar)ra2s2eWxzW1zWxz1+(1+ar)(ar)a2"W0xzs2e+ar1rb0s2vbWz1Wxz#19=;:(17)92ThisresultfollowsfromthefactthatthelimitingdistributionofbbOisalinearcombinationofthetwoestimatorsbb(1)2SLSandbb(2)SS2SLSwithweightra(seeappendixforproof).Then,anaturalapproximationforvar(‹bo)isasfollows:var(‹bo)ˇŸvar(‹bo)=(N1+(1r)N2)1avar(bbO)Thisapproximationnestsconventional2SLS,whichcorrespondstothecasewherea=1andr=1underwhichtheasymptoticvarianceistheconventional2SLSasymptoticvariance(Wooldridge2010).TS2SLShasanasymptoticvariancethataccountsforvariationintheestimatebbOduetosamplingerror,butonlyduetotheerrororiginatingintheSS2SLScomponent.The2SLS'asymptoticvariancetreatsthestageparameterasknown(Wooldridge2010),andsoanyvariabilityfromthestageofthe2SLScomponentoftheTS2SLSestimatorisignored.Thepresenceof(1r)inthestabilizingfactor,(N1+(1r)N2)1,impliesthatin-creasingstageobservationsN2hasadiscountedanddiminishingeffectonprecisionrelativetoincreasingsecondstageobservationsN1.InoueandSolon(2008)derivetheasymptoticvarianceofTS2SLS,butdosoonlyforthecaseofindependentprimaryandsupplementalsamples(i.e.,r=0).Allowingthesamplestogenerallyoverlap,wethattheasymptoticvarianceisdecreasinginr.Thebasicintuitionunderlyingthispropertyisthatsamplingvariationinthestageparametercomingfromasecondary,indepen-dentsampleisadditionalnoiseoriginatingwitherrortermv2unrelatedtotheoutcomey1.Firststageestimatesamplingvariationcomingfromthesameunitsusedinsecondstageestimation(astheywouldinthetypical2SLScomputationimpliedbyr=1)haveacomponentwithexplanatorypowerforoutcomeythroughtheerrorcomponentofx1,v1.2.5TS2SLSUnderDataAvailabilityConstraintsThetraditionalmotivationbehindusingTS2SLSistoachievean2SLSestimatewhereallneces-saryvariablesarenotavailableinasinglesample.AsecondpotentialreasontouseTS2SLSisthat93itmayprovidelowerbiasorvariancethanthebestavailable2SLSestimator.‹b2SLS(N0)asa2SLSestimatorusingN0observations.‹bTS2SLS;N2;N1astheTS2SLSestimatorusingN2observationstoestimatethestageandN1observationstoestimatethesecondstage.Supposearesearcherhasaccesstoasingle-sample2SLSestimator.Providedmodelassumptionsaremet,weproposethataresearcherwithaccesstolargersupplementalsamplesforeitherthestageorsecondstagecanprovideaTS2SLSestimatorthatoutperformsthe2SLSestimatoronbias,variance,orboth.UsingalargesupplementalsamplefortheinaTS2SLSestimatorimprovesboththebiasandvarianceoverusingthesingle-sample2SLSestimator;usingalargesupplementalsampleforthesecondstageinaTS2SLSestimatorimprovesthevariance.Thismotivatestwoconjectures:Conjecture7.ThereexistsavalueN00>N0suchthatvar(‹bTS2SLS;N00;N0)N0suchthatvar(‹bTS2SLS;N0;N00)N0suchthata)Ÿvar(‹bTS2SLS;N00;N0)<Ÿvar(‹b2SLS;N0)ifr6=1,whereŸvaristheasymptoticvarianceapproximationpresentedinequation17andb)jŸbias(‹bTS2SLS;N00;N0)jN0suchthatavar(‹bTS2SLS;N0;N00)N0(1r)havar(‹b2SLS)i1avar(‹bTS2SLS)avar(‹b2SLS):Ataminimum,N00mustbeatleastaslargeasN0,notingthatavar(‹bTS2SLS)avar(‹b2SLS)ispositiveInthelimitingcaseofr=1,TS2SLShasnovariancepenaltyrela-tiveto2SLS,andthusTS2SLSwouldhavethesamevarianceprovidedthatN0=N00foreither‹bTS2SLS;N0;N00or‹bTS2SLS;N00;N0.TheTS2SLSvariancepenaltyismaximizedforr=0,requiringthelargestincreaseinrst-stagesamplesizetoachieveequalvariancetothesingle-samplees-timator.Similarly,thefollowingvaluesforN00(i.e.,second-stageobservationsinthecontextof‹bTS2SLS;N0;N00)satisfy10:N00>N0havar(‹b2SLS)i1avar(‹bTS2SLS)(1r)avar(‹b2SLS):Whenr=1anda=1,avar(‹bTS2SLS)=avar(‹b2SLS);andunsurprisinglyweneedonlyN00>N0forTS2SLStoachievelowervariancethan2SLSaccordingtotheapproximation.Finally,forproposition9b.,wesetN00toaquantitysuchthatjKsev((N0rN00)=(1r)N00)s2vbN00g0Wzg+Ks2v+((N0rN00)=(1r)N00)s2vj=>;=hEbX011bX11+EbX012bX121EbX011bX11ThenumeratorisequaltoEbX011bX11=EX011PZ11X11=Eg0Z011+V011PZ11(Z11g+V11)=Eg0Z011Z11g+EV011PZ11V011=rN2g0Wzg+Ks2v:wherePZ11Z11(Z011Z11)1Z011istheprojectionmatrix.Byassumption1,E(z01iz1i)=Wz.Sincerank(PZ11)=K,w.p.a1,wehaveV011svPZ11V11svsc2K,andEc2K=Kforthelastequality.114AnothercomponentofthedenominatorisEbX012bX12=EˆhZ12g+Z12Z022Z221Z022V22i0hZ12g+Z12Z022Z221Z022V22i˙=Eg0Z012Z12g+Eg0Z012Z12Z022Z221Z022V22+EV022Z22Z022Z221Z012Z12g+EV022Z22Z022Z221Z012Z12Z022Z221Z022V22=(N1rN2)g0Wzg+EV022Z22Z022Z221Z012Z12Z022Z221Z022V22ThethirdequalityfollowsbecauseZ12andZ22areindependentandbyAssumption5,E(V22jZ22)=0:BecauseEV022Z22(Z022Z22)1Z012Z12(Z022Z22)1Z022V22isascalar,EV022Z22Z022Z221Z012Z12Z022Z221Z022V22=EhtrV022Z22Z022Z221Z012Z12Z022Z221Z022V22=EhtrZ022V22V022Z22Z022Z221Z012Z12Z022Z221=EntrhZ022EV22V022jZ22Z22Z022Z221Z012Z12Z022Z221io=EntrhZ012Z12Z022Z221ios2v=trnEhZ012Z12Z022Z221ios2vˇ(N1rN2)(1r)N2s2vThethirdequalityfollowsbythelawofiteratedexpectations.TheindependencebetweenZ12andZ22andAssumption1.(a)and5enablesustopullE(V22V022jZ22)=E(V22V022)=s2vInoutoftheexpectation.ThelastequalityfollowsbecauseZ012Z12isindependentofZ022Z22andE(Z012Z12)=(N1rN2)Wz,E(Z022Z22)=(1r)N2Wz,andtheorderTaylorexpansion115givestrnEhZ012Z12(Z022Z22)1ioˇ(N1rN2)(1r)N2:Therefore,EbWˇrN2g0Wzg+Ks2vN1g0Wzg+Ks2v+(N1rN2)=(1r)N2s2vProofforproposition4bb(1)2SLSb=bX011bX111bX011X11Wehavederivedtheexpectationofthedenominatorinremark3.Similarly,theexpectationofthenumeratorisEbX011X11=EX011PZ11E11=Eg0Z011+V011PZ11(qV11+R11)=qEV011PZ11V11+EV011PZ11R11=qs2vEc21=Ksev;Thesecondequalityfollowsfromassumption4,thatis,Eg0Z011PZ11X11=Eg0Z011X11=0Also,duetoassumption3.(c),writeX11=qV11+R11,andR11isindependentofV11.SoE(V011PZ11R11)=0forthefourthequalitytohold.TheTaylorexpansionofgX011PZ11X11;X011PZ11X11116atpointEX011PZ11X11;EX011PZ11E11isEbb(1)2SLSbˇEbX011bX111EbX011X11=KsevrN2g01Wzg1+Ks2v:FollowingAngristandKruger(1995),Ebb(2)SS2SLSb=EbX012bX121bX012Y12b=EˆhbX012bX12i1hbX012(X12b+V12)i0˙b=EbX012bX121bX012X12bbThesecondequalityfollowsfromAssumption3(E(v2jjz2j)=0)andtheindependencebe-tweenV12and(X22;Z12),EbX012bX121bX012X12=E(ˆhZ12Z022Z221Z022X22i0hZ12Z022Z221Z022X22i˙1X022Z22Z022Z221Z012V12)=0117ThefollowingshowsthatE(x1ijbx1i)=bx1inEbx01ibx1i1Ebx01ix1iofori=1;:::;rN2:E(x1ijbx1i)=E(z1ig+v1ijz1ibg2)=E(z1igjz1ibg2)=z1ibg2Ebg02z01iz1igEbg02z01iz1ibg2=z1ibg2Ebg02z01iz1ig+Ebg02z01iv1iEbg02z01iz1ibg2=bx1iEbx01ix1iEbx01ibx1i;Becausebg2=z02jz2j1z02jx2jandthefactthatv1iisindependentofz1i;z2j;x2j,thesecondequalityfollowsfromE(v1ijz1ibg2)=0.Thethirdequalityisclearsincebg2canbeseenasaconstantsothatz1igislinearinz1ibg2.Bystackingthenumberoftheobservations,itfollowsthatE(X12jbX12)islinearaswellandthatE(X12jbX12)=bX12ˆEhbX012bX12i1EhbX012X12i˙.Oncemore,bythelawofiteratedexpectations,EbX012bX121bX012X12=EbX012bX121EbX012X12ThenumeratortoEbX012X12=EˆhZ12g+Z12Z022Z221Z022V22i0(Z12g+V12)˙=Eg0Z012Z12g+E(V022Z22Z022Z221Z012Z12g)+Eg0Z012V12+EV022Z22Z022Z221Z012V12=Eg0Z012Z12g=(N1rN2)g0WzgThethirdequalityfollowsfromassumption5andV22andV12comingfromindependentsam-118ples.Also,recallE(z02iz2i)=Wz:Wealreadyderivedinremark3EbX012bX12=(N1rN2)g0Wzg+((N1rN2)=(1r)N2)s2v;sotheapproximatebiasofSS2SLSfollowsas,Ebb(2)SS2SLSb=s2vb=((1r)N2)g0Wzg+s2v=((1r)N2):ProofforProposition5EhbWbb(1)2SLSb+1bWbb(2)SS2SLSb=EhbWbb(1)2SLSb+E1bWbb(2)SS2SLSiE1bWb=E bX011X11bX011bX11+bX012bX12!+E bX012X12bX011bX11+bX012bX12!bE bX012bX12bX011bX11+bX012bX12!bˇEbX011X11+EbX012X12bEbX012bX12bEbX011bX11+bX012bX12=Ksev((N1rN2)=(1r)N2)s2vbN1g0Wzg+Ks2v+((N1rN2)=(1r)N2)s2v:ThethirdapproximationusesaTaylorexpansion.Allthemomentsinthisapproxi-mationcanbefoundintheproofsofproposition1and2.ProofforProposition6Theasymptoticvarianceof2SLSestimatorisprN2bb(1)2SLSba˘Nh0;s2eEx1ix1i1i=Nn0;s2eWxzW1zWxz1owherex1i=z1ig=z1iEz01iz1i1Ez01ix1i(Wooldridge2010).119TheasymptoticvarianceoftheSS2SLSestimatorisadaptedfromInoueandSolon(2010)inthiscontextasaspecialcaseofTS2SLS:pN1rN2bb(2)SS2SLSba˘N8<:0;"W0xzs2u+ar1rb0s2vbWz1Wxz#19=;:Bytheasymptoticequivalencetheorem,pN1+(1r)N2bbOb=sN1+(1r)N2rN2prN2bWbb(1)2SLSb+sN1+(1r)N2arN2pN1rN21bWbb(2)SS2SLSbp!p(1+ar)raprN2bb(1)2SLSb+p(1+ar)(ar)apN1rN2bb(2)SS2SLSba˘N0;(1+ar)ra2s2eWxzW1zWxz1+(1+ar)(ar)a2"W0xzs2u+ar1rb0s2vbWz1Wxz#135:120REFERENCES121REFERENCES[1]Anderson,M.L.,&Matsa,D.A.(2011).ArerestaurantsreallysupersizingAmerica?Amer-icanEconomicJournal:AppliedEconomics,152Œ188.[2]Angrist,J.D.,&Evans,W.N.(1998).ChildrenandTheirParents'LaborSupply:Evi-dencefromExogenousVariationinFamilySize.TheAmericanEconomicReview,88(3),450Œ477.[3]Angrist,J.D.,Imbens,G.W.,&Krueger,A.B.(1999).Jackknifeinstru-mentalvariablesestimation.JournalofAppliedEconometrics,14(1),57Œ67.http://doi.org/10.1002/(SICI)1099-1255(199901/02)14:1<57::AID-JAE501>3.0.CO;2-G[4]Angrist,J.D.,&Krueger,A.B.(1992).TheEffectofAgeatSchoolEntryonEd-ucationalAttainment:AnApplicationofInstrumentalVariableswithMomentsfromTwoSamples.JournaloftheAmericanStatisticalAssociation,87(418),328Œ336.http://doi.org/10.2307/2290263[5]Angrist,J.D.,&Krueger,A.B.(1995).Split-SampleInstrumentalVariablesEstimatesoftheReturntoSchooling.JournalofBusiness&EconomicStatistics,13(2),225Œ235.http://doi.org/10.2307/1392377[6]Arellano,M.,&Meghir,C.(1992).FemaleLabourSupplyandOn-the-JobSearch:AnEmpiricalModelEstimatedUsingComplementaryDataSets.TheReviewofEconomicStudies,59(3),537Œ559.http://doi.org/10.2307/2297863[7]Bekker,P.A.(1994).AlternativeApproximationstotheDistributionsofInstrumentalVari-ableEstimators.Econometrica,62(3),657Œ681.http://doi.org/10.2307/2951662[8]Bound,J.,Jaeger,D.A.,&Baker,R.M.(1995).ProblemswithInstrumentalVariablesEstimationWhentheCorrelationBetweentheInstrumentsandtheEndogeneousExplana-toryVariableisWeak.JournaloftheAmericanStatisticalAssociation,90(430),443Œ450.http://doi.org/10.2307/2291055[9]Bun,M.J.G.,&Windmeijer,F.(2011).Acomparisonofbiasapproximationsforthetwo-stageleastsquares(2SLS)estimator.EconomicsLetters,113(1),76Œ79.http://doi.org/10.1016/j.econlet.2011.05.047[10]denBerg,G.J.,Pinger,P.R.,&Schoch,J.(2015).Instrumentalvariableestimationofthecausaleffectofhungerearlyinlifeonhealthlaterinlife.TheEconomicJournal.Retrievedfromhttp://onlinelibrary.wiley.com/doi/10.1111/ecoj.12250/abstract[11]Devereux,P.J.,&Hart,R.A.(2010).ForcedtobeRich?ReturnstoCompulsorySchoolinginBritain*.TheEconomicJournal,120(549),1345Œ1364.122[12]Gong,H.,Leigh,A.,&Meng,X.(2012).IntergenerationalincomemobilityinurbanChina.ReviewofIncomeandWealth,58(3),481Œ503.[13]Hahn,J.,&Hausman,J.(2002).Notesonbiasinestimatorsforsimultaneousequationmodels.EconomicsLetters,75(2),237Œ241.http://doi.org/10.1016/S0165-1765(01)00602-4[14]Hahn,J.,Hausman,J.,&Kuersteiner,G.(2004).Estimationwithweakinstruments:Ac-curacyofhigher-orderbiasandMSEapproximations.TheEconometricsJournal,7(1),272Œ306.[15]Hausman,J.A.,Newey,W.K.,Woutersen,T.,Chao,J.C.,&Swanson,N.R.(2012).Instrumentalvariableestimationwithheteroskedasticityandmanyinstruments.QuantitativeEconomics,3(2),211Œ255.http://doi.org/10.3982/QE89[16]Inoue,A.,&Solon,G.(2010).Two-SampleInstrumentalVariablesEstimators.ReviewofEconomicsandStatistics,92(3),557Œ561.http://doi.org/10.1162/REST_a_00011[17]Klevmarken,N.A.(1982).OntheStabilityofAge-EarningsTheScandinavianJournalofEconomics,84(4),531Œ554.http://doi.org/10.2307/3439516[18]Nagar,A.L.(1959).TheBiasandMomentMatrixoftheGeneralk-ClassEsti-matorsoftheParametersinSimultaneousEquations.Econometrica,27(4),575Œ595.http://doi.org/10.2307/1909352[19]Nicoletti,C.,&Ermisch,J.F.(2008).Intergenerationalearningsmobility:changesacrosscohortsinBritain.TheBEJournalofEconomicAnalysis&Policy,7(2).Retrievedfromhttp://www.degruyter.com/view/j/bejeap.2007.7.2/bejeap.2007.7.2.1755/bejeap.2007.7.2.1755.xml[20]Olivetti,C.,&Paserman,M.D.(2014).IntheNameoftheSon(andtheDaugh-ter):IntergenerationalMobilityintheUnitedStates,1850-1940.Retrievedfromhttp://people.bu.edu/olivetti/papers/Olivetti-Paserman_NameOfTheSon_July2014.pdf[21]Pierce,B.L.,&Burgess,S.(2013).EfDesignforMendelianRandomizationStudies:Subsampleand2-SampleInstrumentalVariableEstimators.AmericanJournalofEpidemi-ology,178(7),1177Œ1184.http://doi.org/10.1093/aje/kwt084[22]Rosenzweig,M.R.,&Wolpin,K.I.(2000).Naturalfinaturalexperimentsflineconomics.JournalofEconomicLiterature,827Œ874.[23]Rothstein,J.,&Wozny,N.(2013).Permanentincomeandtheblack-whitetestscoregap.JournalofHumanResources,48(3),510Œ544.[24]Wooldridge,J.M.(2010).Econometricanalysisofcrosssectionandpaneldata.MITpress.123Chapter3.EstimatingandValidatingNonlinearandHeterogeneousClassroomPeerEffects1IntroductionStudiesofeducationalpeereffectssufferfromtheclassictensionsbetweenthevalidity,power,andcost-effectivenessofresearchdesigns.Naturalexperimentsgenerallyprovidecrediblesourcesofoflimitedpeer-effectsmodels,butfrequentlylackthenecessarystatisticalpowertofullyexplorenonlinearandheterogeneouseffects.Observationaldataprovidespowerinexcess,butwithouttheprimafacievalidityofestimatesconferredbyrandomizedassignment.Large-scalerandomizedtrialsarecostlyandmayfailtoyieldeffectswithclearpolicyimplications,asthereexistsbothempiricalandtheoreticalevidencethatpeereffectsinthepresenceofpurelyrandomassignmentdifferfromthosewithendogenousassignment(Weinberg2007;Dupas,andKremer2011).Observationalapproachesmayprovideusefuladvantagesoverquasi-experimentalorexperimentalmethodsifinferencesfromobservationalstudiescanbemadecrediblyrobusttopotentialbiases.Combinedwiththegrowthinavailabilityofadministrativeeducationaldatasets,robustobservationalmethodscansupplyalow-cost,scalablemethodfordevelopingpeereffectestimatesdirectlyrelevanttolocalpolicy.Thispaperusesanobservationalapproachtoestimatethenonlinearshapeofpeereffects,ex-amineswhethereffectsvarydependingonastudent'srelativeabilityintheclassroom,andcheckstheplausibilityofestimatedpatternsusingaplacebotestingapproach.UsingadministrativedataforstudentsinNorthCarolinahighschoolsfrom2006-2013,Iestimateamodelofabilitypeereffectswithlinear-in-sharesandlinear-in-meanscomponentsforstandardizedtestsinAlgebraII,Geometry,Biology,PhysicalScience,U.S.History,Civics,andEnglishI.Icontrolforseveralpo-tentialconfoundingfactors,includingstudentpasttestscores,teacherquality,andschoolquality.Incontrastwithpriorstudiesusingasingletestscoreoranunweightedaverageoftestscores,124Imeasurepeerabilityusingaregression-calibratednonlinearfunctionofpriortestscoreswhichbettermeasurestheunderlyingabilityconstructassociatedwithaparticulartest.Estimatesrevealsomeevidenceofpeereffectsoperatingthroughmeanabilityinaclassroom,butIalsoevidencethattheyarebiasedupwardbysortingornon-classroompeereffects.Irobustevidenceofpeereffectsnonlinearinability,witheffectsmonotonicallyincreasinginpeerabilityinmostcases.IalsothatpeereffectsaredecreasinginrelativeabilityofastudentŠhigher-achievingstudentswithinaclassroomtendtoreceivesmallertestscoreincreasesthantheirlower-achievingpeersfromimprovingpeerabilityinanypartofthedistribution.Toassesstheextenttowhichsortingornon-classroompeereffectsmaybedrivingtheobservedlinearandnonlinearassociations,Iestimateaseriesofplaceboregressions.Theregressionstestwhetherastudent'stestscoreinacoresubjectispredictedbytheabilitycompositionofastudent'sclassroomsforothercoresubjects(thefiplacebofl)conditionalontheoutcomeclassroom(fitreatmentfl)abilitycomposition.Whiletherearealmostalwaysbothestimatedlinearandnonlinearcoeffortheclassroomscorrespondingtothetreatmentclassrooms,placeboclassroomsalmostalwaysreturnlinearcoefbutlargelynonlinearcoefIinterpretthispatternasevidencethattheestimatedcoefforpeermeanabilityaredrivenbysortingorpeereffectsexternaltotheclassroom,butthatthemodel'sestimatesofpeerability'snonlineareffectsarevalid.Iprovideamoreformaleconometricinterpretationforplacebotests,showinghowtheplaceboestimandinlargesamplesisdirectlyproportionaltothemagnitudeofomittedvariablesbiasinthemainestimates.Thisstudycontributesnewknowledgeaboutthenonlinearshapeofpeereffectsforhighschoolstudentsandhoweffectsareheterogeneousdependingonastudent'sownability.Italsoappliesauniqueplacebomethodologytoexaminethevalidityofeachestimate,providingnewevidenceofbiasinlinear-in-meansestimatesfromobservationaldata.Previousstudieshaveprovidedsomeevidenceofnonlineareffects,andthisstudyexpandsthebodyofevidenceonnonlineareffectstoincludericherdivisionsofstudentsintoabilitygroupandnonlineareffectsforsevencorehighschoolsubjects.Eithernonlinearorheterogeneouspeereffectsarenecessarycon-125ditionsfortheretobeanyimprovementintotaleducationproductionbysortingintoclassroomsbasedonpeergroups(Carrell,Sacerdote,andWest2013).However,itisimportanttodistinguishbetweennonlinearresponsetotheabilityofindividualstudentsincontrastwithnonlinearityintheaggregatecompositionofapeergroup.Linear-in-sharescapturethedegreetowhichonestudentcanhaveanonlinearresponsetotheabilityofanotherstudent.However,theremaybefurtheremergenteffectsfromclassroomcomposition:twohighabilitystudentsmayhaveaneffectononelowabilitystudentthatisgreaterthanthesumofindividualeffectsofhighabilityonlowability,whichlinear-in-sharesmodelsdonotdirectlyaccountfor.Forbothlinear-in-meansandlinear-in-sharesmodels(withedclasssizes),improvementsfromreassignmentofapeertoan-otherclassroomareoffsetbylossesinanother.Nonetheless,themethodsusedheretovalidatethecoefcanbeappliedtobroadernonlinearorheterogeneous-effectthatcapturethenonlinearcompositionaleffectsrequiredfortheretobegainsfromchangingpeerabilitygroup-ing.InSection2.4,Iestimateamodelwithheterogeneouseffectsbystudents'ownabsoluteabilityandusetheestimatestodemonstrateanexampleofoptimalabilitygroupingfortwoAlgebraIIclassrooms.Ithattheunderlyingestimatedheterogeneouseffectsarerobusttounobservablesaccordingtothesameplacebotestsasthemainresults.Icalculatethatthemeanachievementgainsofoptimalabilitytracking(basedonlyonthemodeledheterogeneity)overrandomassignmentareatleast0.03SD.Furtherextensionsofthemodeltoaccommodatenonlinearityincompositionandheterogeneouseffectscanbeusedtoidentifyabilitysortingstrategiesthatresultinadditionaltestscoregains.1.1ThePeerEffectsLiteratureTheeducationalpeereffectsliteratureaimstomeasuretheinterdependenceofoutcomesamongstudents.Theexistenceofacademicanddisciplinaryspilloversbetweenstudentshaspolicyimplicationsfortheoptimalgroupingofstudentsbetweenorwithinschools.Thequintessen-tialprobleminthemeasurementofpeereffectsisendogeneityofassignmenttoapeergroup:low-performingstudentsmaytendtobeassignedtolow-performingschoolsandclassroomsand126havelow-performingfriends.Onestrandofliteratureattemptstoisolateidiosyncraticvariationinpeergroupcompositionovertime,essentiallycontrollingfor(orexploiting)cross-groupvaria-tionwithinschoolsthroughadjacentdifferencingtechniques(Hoxby2000;Lavy,Paserman,andSchlosser2011;Bifulcoetal.2011).BurkeandSass(2013)estimatepeereffectsobservationallyaswell,usingestimatedstudentedeffectsfrompastachievementasmeasuresofpeerability.Somepapershaveusedexplicitlyplausiblyexogenousnaturalsourcesofvariationinpeergroups,suchasquasi-randomcollegedormitoryassignment(Sacerdote2001)andAirForceacademysquadronassignments(Carrell,Fullerton,andWest2009).Severalstudiestestforwhetherpeereffectshaveanynonlinearshape.Lavy,Paserman,andSchlosser(2012)thattheproportionofgrade-repeatingstudentsinaclassroomhasanegativeimpactonclasswideachievementinIsraelimiddleschools.Usingsurveydata,theythathigherproportionsoflower-achievingstudentsalterteachers'pedagogicalpractice,increasethefrequencyofviolentordisruptivebehavior,andharmstudent-teacherrelationships.Usingwithin-studentregressions,Lavy,Silva,andWeinhardt(2012)thatthefractionofstudentsinthebottom5percentoftheabilitydistribution(basedonpriortestscores)isnegativelyassociatedwithtestscores.1.2PeerEffectsModelIemployatypicaleducationproductionfunctionusingacombinationoffilinear-in-meansflandfilinear-in-sharesflpeereffects:yrigst=X(i)gstl+Xd(i)gstb1+Adigst?Xd(i)gstb2+Xigsth+Ggstg+as+Tgs+eigst(18)Thismodelrelatesastandardizedtestscoreyinanysubjectrforstudentiinclassroomginschoolstoindividual,classroom,school,andpeercharacteristicsattimet.Studentsareonlyob-servedforeachsubjectinonetimeperiod,resultinginarepeatcross-sectionofstudentsnestedinclassroomsandschools.X(i)gst,thefilinearcomponentflofthepeereffect,ispartoftheclassic127linearbaselineforpeereffectsstudies(e.g.,Carrell,Fullerton,andWest2009),mea-suringthemeanabilityofstudenti'sclassroompeers.Xd(i)gstisavectorofsharesofastudent'sclassroompeersineachdecileofabilityinthegivensubject,representinganonlinearcompo-nentofthepeereffect.Adisavectorofdummyindicatorsforthedecileofabilitywithintheclassroomintowhichstudentifalls.Adigst?Xd(i)gstistheinteractionbetweenthetwofactors,itsvectorofcoefb2measuringheterogeneousnonlinearresponsetopeerabilitylevelswithrespecttoastudent'srelativeabilityranking.XigsandGgsrepresentobservableindividual-andgroup-levelcharacteristics,whileasandTgscorrespondtoschool-andteacher-unobservedhet-erogeneity(whicharecontrolledforinestimationbyedeffects).ThecoeflandbcanbeunderstoodasManski's(1993)fiexogenouseffects,flmeasuringtheeffectsofpre-existingpeerbackgroundcharacteristicsratherthantheeffectsofcontemporaneouspeerperformance.Inprac-tice,estimatesforthesecoefmayalsotheeffectsofendogenousdecisionsmadebystudentsthatarepredictedbypeercompositionvariablesbutarenotaccountedforbycontrols.Whileeachcoefonpeercharacteristicsameaningfulaggregateofpeereffects,theexactmechanismsdrivingtheeffectsarediftodiscern.Low-achievingpeerscanimpactclassroomsbyalteringteachers'pedagogicalapproachesorbydivertingteachereffortfromotherstudents.Studentsengagingindisruptivebehaviorcanalsodivertteacherefforttodisciplinaryactioninsteadofinstruction,affectotherstudents'abilitytoengageinlearning,orcausefurtherdisruptivebehavioramongotherstudents.Furthermore,bothlowachievementandpoorclassroomdisciplinearehighlycorrelated.Measuresofbotharecharacterizedbymeasurementerror.Thus,estimatesoftheimpactsofabilitymayinparttheeffectsofdisciplinarypeereffects.Generally,high-achievingpeersarelikelytohavepositiveeffectsonclassroomlearning.Ingroupactivitiessuchassciencelabsorin-classopenstudytheymayfacilitateadditionallearningandevenbeexplicitlytaskedbytheteachertoassistwithinstruction.Studentsofhigherabilitytendtohaventlyfewerdisciplinaryinfractionsandsowillpositiveeffectsonclass-roomachievementtotheextentthattheydisplacemoredisruptivestudents.Socialtiesplausiblyspanachievementlevelsandmayresultintopstudentsassistingotherstudentswithstudyingor128homeworkassignments.Ontheotherhand,somemodelsofpeereffectssuchasthefiinvidiouscomparisonflmodelsuggestthatadditionalhigh-achievingstudentsmaycreateaclimatediscour-agingtolower-achievingpeersbydiminishingtheperceivedreturnstoadditionalacademiceffortoraffectingpedagogicalpractices(e.g.,encouragingtheteachertocovermoreadvancedmaterial).Ifteachersaimtomaximizetheshareoftheirstudentspassing,pedagogicalpracticesareunlikelytobeaffectedbyhigh-achievingstudents.Akerlof(1997)andLantis(2014)putforwardmodelsthatimplythatifstudentsareengagedinafitournamentflwithintheirpeergroupforvariousre-wards,theabilitylevelsofcompetingstudentsaffecttheireffortchoices.Themostcommonplaceinstanceofthetournamentoccursduetogradecurving,whererelativeperformancewithintheclassroomdictatestherewardofhighgrades.1.3BiasesfromSortingandIdentifyingNonlinearvs.LinearEffectsInsection2.2,Iconcludethatestimatesofthelinearcomponentofpeereffectsarelikelytobebiasedupwardbysortingornon-classroompeereffectsevengivenarichsetofcontrols,butthenonlinearcomponentsareapproximatelyunbiasedfromthesesources.Thereareseveralmean-ingfulreasonswhytheestimatesforthelinearcomponentsarenotcredible,whilethenonlinearcomponentsare.Thebulkofthecorrelationbetweenownperformanceandpeers'abilityislikelytobeaccountedforbyastudent'sownabilitymeasurebasedonpasttestscores,schoolquality,teacherquality,andcourselevel.However,theremayremainsortingbetweenmultipleclassroomsatthesamelevelwiththesameteacher,andtherealsomaybetime-varyingchangeswithingroupswhichexplainpartofthecorrelationbetweenownacademicperformanceandpeers'ability.Meanclassroompeerabilitythuscapturesatleastpartofthesortingmechanismsandshocksnotac-countedforbytheexistingsetofcontrols.Thisamountstochangingtheempiricalroleofmeanpeerabilityfromavariableofinteresttoacontrolvariable,allowingmorerobustinferenceforthenonlineareffectsofpeerability.Considerasimpleoneschool,two-classroomsettingwitharepeatcross-sectionstructure.Oneclassroomisasfihighabilityfl(e.g.,honors)andtheotherfilowabilityfl(regular),and129studentsareprobabilisticallysortedonanabilitycutoffintoeach,buttheclasstypeisnotdirectlyobservedbytheresearcher.Inthisstylizedsetting,meanpeerabilitywouldperfectlydistinguish(inexpectation)betweenthehigh-andlow-abilityclassroomsandmuteanycorrespondingbiasfromsorting,leavingonlyvariationinthesharesofpeerabilityorthogonaltothecoursetypeidentifyingtheireffects.Thehighabilityclassroomhasahigheraverageshareof90th-percentileabilitystudentsthanthelowabilityclassroom,butonlydeviationsfromtheexpectedshareof90th-percentileabilitystudentsforeachclasstypeareusedtoidentifythenonlineareffectof90thpercentilestudents.Inthatmanner,meanpeerabilityplaysacomparableroletoaedeffectthatcouldbeincludedifclasstypewereexplicitlyobserved.Inthissetting,Icontrolfortwokeycharacteristicsofcoursesthatarestronglyrelatedtosortingintothem:theirformallevelandtheirteachers.Inaddition,Icontrolforastudents'ownunderlyingability,whichiscorrelatedwithacademicperformanceandhencelikelycorrelatedwithclassroomsorting.However,theremaybeabilitydifferentiationbetweenclassroomswhichpersistsevenconditionalonthesecontrolsbecauseoftransitoryshocksintimeorbecauseofthestructureofaschool'sclassroomsinasubject.School-orannualshockscouldshiftbothastu-dent'sperformanceandtheabsoluteabilitycompositionofhispeersthroughsampleattrition(i.e.,dropoutsandretention).Alternatively,ateachermayhavemultipleclassroomsinthesameformalcourselevelthatareachievement-differentiated,resultingincorrelatedsortingnotaccountedforbythecontrols.Asinthestylizedmodel,meanpeerabilityintheseclassroomscapturesatleastpartoftheresidualunobservedcharacteristicsandtime-varyingshocks.Variationinthedistributionofpeerabilityinaclassroomislikelytobecomingfromamix-tureofcross-cohortvariationinabilitycomposition,students'course-pickingdecisions(basedonpreferencesforteachersorfriends)andconstraints(e.g.,thetimingoftheirothercourses),andanyadministrativerulesforordirectinterventioninstudentschedules.Sortingintosubjectsismoderatedbythenumberofclassroomsavailableintowhichstudentscansort:ifthereisonlyoneclassroomavailableforasubject,thenallvariationinpeerabilitycompositionintheclassisonlydrivenbycross-cohortvariation.Theincludededeffectsnarrowthistothenumberof130classroomsinasubjectperteacherandpercourselevel.Alargenumberofclassroomspotentiallyimpliesgreaterunobserveddifferentiationbetweenclassrooms.1.4DataDescription1.4.1NorthCarolinaAdministrativeDataTheNorthCarolinaEducationResearchDataCenter(NCERDC)isacentralizeddatabaseofad-ministrativedataforNorthCarolinapublicschools.Itcontainsstudent-leveldataforallstudentsingrades3-12spanningfromasearlyasthe1997-1998schoolyeartothe2013-2014schoolyear.Studentscanbetrackedacrossyearsusingthedatabase'sanonymizedmaster.Basedontheoverlappingavailabilityofmultiplevariablesusedintheanalysis,Iuseallavailablehistoricaldataforstudentswhowereingrades9-12from2006-2014.Thestudent-levelmasterbuildformthecoredatasetfortheanalysis,providingarecordidentifyingeachstudentbyschoolmembership,gradelevel,andschoolyear.Thecoursemem-bershipallowstudentstobematchedtotheirclassroompeers,andclassroomstobematchedtothecorrespondingsubjecttests.Theattendanceanddemographicprovidesstudents'monthandyearofbirth,sex,ethnicity,schoolmembershipandnumberofdaysthereinforeachyear,andannualattendance.Themastersuspensiondocumentsuspensionsandotherdisciplinaryactionstakenbytheschoolagainstthestudent,includinginformationontheoffenseandtheex-tentofthepunishment.Finally,themeasuresofsecondaryschoolachievementIusecomefromNorthCarolina'sstate-mandatedEndofCourse(EOC)testsforAlgebraI,AlgebraII,PhysicalScience,andEnglish.Eachtestingmandatecoversadifferentperiod,withsometestsnolongerbeingadministered.Forexample,AlgebraIIEOCtestswerenolongeradministeredafterthe2011schoolyear.Foreachtestsubject,Istandardizestudents'testscoreswithinschoolyearacrossallstudentswithavailabletestscoresinthedatabasetoaccountforannualstatewidevariationinscoredistributions.1311.4.2DeterminingCourseMembershipDuring2006-2013,NorthCarolinahighschoolstypicallyoperatedina4x4semesterblocksched-ule.Under4x4blockscheduling,studentstake4coursesinFalland4coursesinSpringwith90-minutedailyinstructionalperiods.Mostcorecourseslastonesemester,thoughsomeschoolsofferyear-longversionsofcourses.TheNCERDCdataprovidesrecordsonstudents'member-shipincoursescollectedfromdatabasesonetimeoneachofthedaysofFallandSpringsemesters.Thisincludesthecoursetitle,semester,teacherID(whenmatchedbyNCERDC),andsection(i.e.,classperiod)ofthecourse.Students'classroomscontainacoursetitlethatisusedtoassignthemintosubjectgroups.Dependingonkeywordsinthecoursetitles,coursesareasmath,science,socialstudies,orEnglish.Forexample,classeswithfiAlgebraflorfiAlgflinthetitleareasmath,fiBiologyflorfiBioflscience,fiEnglishflorfiEngflEnglish,andfiCivicsflorfiCivflsocialstudies.ThekeywordlistsIusearelargelycomprehensiveŠImanuallyreviewtheremainderofcoursetitlestoensurethatonlyelectivecoursesremainandnocorecoursesareomitted.Iamabletotrackstudents'changingclassroommembershipsbetweenFallandSpringsemesters.Students'coursechoicesaregenerallystable,butinsomecasesstudentsmaytransferorchangecourseswithinasemester.ToaccountforclassroomchangesfromFalltoSpring(eitherduetoteacherchangesorclassperiodchangeswiththesameteacher),astudentwithrecordsinmultiplecoursesofthesametype(e.g.,AlgebraI)iscountedasbeingincludedinthecoursediscoveredintheSpringdatacollection.TherearemultipletypesofcoursesforwhichthesameEOCtestsareadministered;theyhavevaryingpedagogicaldesignsandtargetdifferentpopulationsofstudents.Studentsmayendoge-nouslytrackintocoursesofdifferentlevels.Tohelpaccountforvariationincourselevels,IusethehierarchicalnatureofNorthCarolina'sstatecoursecodestodifferentiatecoursesintolevelgroups.Forexample,bothfiAlgebraIflandremedially-themedfiFoundationsofAlgebraflcoursesareassociatedwithstudentswhotaketheAlgebraIEOCtestinthesamesemester,butareappro-priatelyassigneddifferent4-digitesintheircoursecodes.Each4-digitinasubjectis132representedbyadummyindicatorcontainedinGgst.1.4.3TestScoresandAbilityPreviousstudieshaveattemptedtomeasurepeerabilityusingpasttestscores(Lantis2014;Lavy,Silva,andWeinhardt2012;HoxbyandWeingarth2006)orothertraitscorrelatedwithlowachieve-ment,suchasgraderepetition(Lavy,Paserman,andSchlosser2012).Iintroduceamorexiblewayofapproximatingtheunderlyingabilityconstructforaoutcomeofinterestusingaregressionoftheoutcomeonanonlinearfunctionofpasttestscores.Atypicalapproachistouseasingletestitselforanunweightedaverageoftests.Incontrast,theregression-basedmethodallowsabroadersetofinformationonstudents'characteristicstobeincorporatedintoabilitymeasuresandempiricallycalibratestherelativeweightsofeachcharacteristicinsteadofimposingthemex-ternally.Amongotherpotentialtheregressionapproachxiblyincorporatesmultiplepriortestscoremeasures(andcharacteristics)nonlinearlyandadjustsforthepartialcorrelationbetweenpastscoresandcurrentscoresgeneratedbyfadeout,subjectdifferences,andnormaltestscorevariation,andtoincorporatemultiplepriortestscoremeasures.Theprocedureistoregressthetestscoreinasubjectonthechosennonlinearfunctionofpasttestscoresandgeneratethevalues,whichformtheabilityscorecomposite.Studentscanthenberankedonthisscoreandthescoreusedtogeneratepeerabilityaverages.Thiscanresultadifferentlinear-in-means(LIM)peereffectcoefestimateevenusingonlyalinearfunctionofasinglepasttestscore,sincetheabilitycompositeregressioncoeffirescalesfltheestimatedLIMpeereffecttoaccountforthepartialcorrelationbetweenpastandpresenttests.Ifmultipletestscoresareavailable,theregression-basedabilityscoreresultsinadifferentabilityrankingthanifstudentswererankedonasingletestscore.Thevaluesanempirically-calibratedweightedaverageofthetwopasttestscores,resultinginahigher-qualitymeasureoftheunderlyingabilityconstructmostrelevanttoagivenhighschoolsubjectthantojustuseasingletest.Forexample,8th-gradereadingscoresaremorehighlypredictiveofhighschoolscienceandEnglishtestscoresthanare8th-grademathscores.Usingasinglescoreorunweighted133averageofthetwoscoreswouldfailtotakethisintoaccount,resultinginadditionalmeasurementerrorinthepeerabilitymeasure.Forthissetting,IuseapiecewisefunctionofNorthCarolina's8th-gradeEnd-of-Grademathandreadingtestsequivalentintoa20-pieceregressionsplinewithcutpointsatvigintilesofeachtestscore.UsingOLS,Iestimatethefollowingmodelofstudenti'stestscoreinsubjects:yri=Mviy1+Miy2+MiMviy3+Rviy4+Riy5+RiRviy6+vriMiandRiarecontinuoustestscores(standardizedwithinyears)formathandreading.MviandRviarevectorsofindicatorvariablesidentifyingwhichvigintileofthescoredistributioneachsubjecttestscorefallsinto.Igenerateanestimatedabilitycompositeusingthevaluesfromthisregression:Ari=‹yri=Mvi‹y1+Mi‹y2+MiMvi‹y3+Rvi‹y4+Ri‹y5+RiRvi‹y6Thisabilityscoreisthenusedtocalculateown-abilitydecileindicatorsAdigstandpeerabilityvariablesinequation18.ThesplinesofmathandreadingscoresusedtogeneratethisabilityscoreareincludedforindividualsdirectlyinXigsforadditionalxibility.Usingtheregression-basedabilitycompositemayinduceminoreconometricproblemsthatlikelydonotimpacttheresultsinthiscontext.Becausetherealizationoftheoutcomeofinterestisusedinestimation,aparticularstudent'spasttestscoresandthedependentvariable(currenttestscore)entersintothecomputationofthecoefthatservesastheweightonpriortestscores.Thus,themainregressionsofthedependentvariableontheabilitymeasurepartiallyusevariationfromthedependentvariableitselfthroughtheestimatedabilityweights.Theproblemissimilarinconcepttothesamplebiasissueininstrumentalvariables,andcanbeunderstoodinthesameway:asthesamplesizegrows,biasfromthisphenomenonshrinkstozero.8Asecondeconometricissueisthattheabilitymeasuresareaformofgeneratedregressor,andusingthem8Ajackknife-typeprocedurecanalsobepotentiallyusedtoexcludeastudent'sownscorefromtheestimationoftheweightsandsidestepthisproblemdirectly.134withoutadjustingfortheirvariabilityresultsinunderestimatedstandarderrors.Whilethisisofgreatestconcerninusingacontinuousabilitymeasure,theprimaryuseoftheabilitymeasureisforestimationofdecilecutpointsforabilityandanyvariabilityisonlyexpressedthroughofmarginalstudentsfromonedeciletotheother.Moreover,theonlydirectuseofcontinuousabilityintheregressioncorrespondingtoequation18isthroughthemeanpeerabilitymeasure,whichtosomeextentdilutestheimpactofsamplingvariabilitybyaveragingoverseveralstudents.2Results2.1Linearvs.Non-linearPeerEffectsEstimatesTables22and23presenttheestimatedcoefforonlypeermeanability,percentagemale,andpercentagepeerswhowereheldbackagradeatleastonceingrades3-8foreachsubject'sEOCstandardizedtestacrossfourofthecompletemodel(includingnonlinearinter-actions).Allcontrolfor8thgrademathandreadingscores(invigintiledummies),gender,thesemesteroftestadministration,andyear,grade,andschooledeffects.Column(2)foreachsubjectisthesameasColumn(1),butwithcontrolsforthecourselevelofthecoursetakeninthesamesemesterasthetest.Column(3)addsteacher-by-schooledeffects,whichareroughlycomparabletoteacherandschooledeffectsindependently(butadditionallyaccountingforteachereffectsforteacherswhoswitchschools).Column(4)istheofcolumn(2)plusschool-by-yearedeffects.Standarderrorsareclusteredattheschoollevel.Asistypicalofobservationalestimatesofpeereffects,thereisastrongpositiverelationshipbetweenclassroommeanabilityandownachievementforallsubjectsacrossallwiththeexceptionofcolumn(3)forBiology.Controllingforcourselevelandteacheredeffectstendstoweaklydecreasethemagnitudeofallpeerabilityestimatesforallsubjects,withthelargestdecreasesoccurringforAlgebraIIandBiology.Asthemoststringentcolumn(3)is135preferred.Themeanabilitycoefareinterpretedastheeffectonstudentsinthelowestdecileofabilityintheclassofraisingmeanabilityintheclassroombutholdingthesharesofstudentsineachabilitydecileed.Forexample,a1S.D.increaseinmeanpeerabilityinastudent'sAlgebraIIclassroomcorrespondstoabaselineincreaseof0.13standarddeviationsinthestudent'stestscore.Inpractice,increasingmeanpeerabilityalsomeansincreasingtheshareofstudentsinhigherabilitydeciles,andIthatpeereffectsaremonotonicallyincreasinginpeerability.Figure12andthetopplotsof7-12plotthe100estimatescontainedinmodelcoefcientvectorsb1andb2forAlgebraII,Geometry,PhysicalScience,EnglishI,U.S.History,Civics,andBiology,respectively.Thecoefestimatescorrespondtotheinteractiontermsbetweensharesofpeersineachabsoluteabilitydecileandindividualstudenti'sownrelativeabilitywithintheclassroom.Theomittedcategoryisforastudentinthebottomdecileofrelativeabilitywithaclassroomcomposedof100%studentsinthebottomdecileofabsoluteability.Thebaselinecoefonthepeerabilitydecilesharesmeasuretheeffectofincreasingtheshareofpeersinanabilitydecilebyoneunit(100%)forastudentinthedecileofrelativeability.Eachdotintheplotshowsthesumofthebaselinecoefandtheinteractioncoefforeachrelative-ability-decile-by-peerability-decilepair.Horizontallineswithineachdivisionrepresenttheaverageeffectforeachpeerabilitydecile.Acrossallsubjects,thereisageneralupwardtrendineffectsizebyabilitydecile.Forexample,inAlgebraII,theaverageeffectontestscoresofa10percentagepointincreaseinstudentsinthe40th-50thpercentile(replacingstudentsinthebottomdecile)isapproximately0.015standardde-viations;theaverageeffectofacomparableincreaseofstudentsintheforthe70th-80thpercentile,0.03standarddeviations;the90th-100thpercentile,0.05standarddeviations.Figure13showsthesameestimatesforAlgebraIIasFigure12,butnowinvertedtohaveownrelativeabilityinthetopaxiseachgroupandpeerabilitydecileshareswithineachgroup.ThesawtoothpatternofFigure13demonstratesastronglynonlinearpeereffectthatisincreasinginability.Forastudentinthe40th-50thpercentile,theeffectofincreasingtheshareofpeersinanabilitydecilerangesfrom0.01SDforthe2nddecileto0.05SDforthetopdecile.136TheapparentnonlinearofpeereffectsissimilaracrossallsubjectsexceptPhysicalScience,whichisorevennegativeathigherpeerabilityshares.Thissomewhatsurprisingresultmaybeaneffectofvaryingstudentpopulationsorclassroomstructuresacrosssubjects.PhysicalScienceprecedesspecializedsciencecourses(Biology,Chemistry,andPhysics)inthecurriculumandisnotarequiredpartofthecoursetrack,butisdesignedasanadditionaloptionforstudentstomeettheirsciencerequirements(NorthCarolinaDepartmentofPublicInstruction2004).Boththemeanandstandarddeviationof8th-grademathandreadingtestscoresamongstudentstakingthephysicalscienceexamarethesmallestamongthesubjectspresentedhere,suggestingbothlower-achievingstudentsingeneralandsmallerdispersioninability.Mean8th-gradescoresinbothmathandscienceforphysicalsciencestudentsare0.2SDbelowthemean.Peereffectsmaybeheterogeneousoverastudent'sownabsoluteability(ratherthanrelativeabilitywithinaclassroom).Alternatively,theremaybeunobservedheterogeneouseffectsarisingfromdifferencesinabilitypeereffectsacrossgrades.Therearealsogenerallyordownward-slopingestimatesacrossindividuals'ownrelativeabilityintheclassroom.Lower-achievingstudentsmaybemoresensitivetopeerforobviousreasons.Instructionaltimemayberelativelymorevaluablefortheirachievementthanself-study,andsodisruptionstoitaremoreharmful.Totheextentthattheseabilityeffectsarebiasedbydisciplinaryspillovers,wemightalsoexpectthatlow-achievingstudentsaremorelikelytocommitinfractionsthatresultinmissedschooltimeordecreasesinteacherinvestmentintheirsuccess.Inaccordancewiththetournamentconcept,inwhichstudentscompetewitheachotherforrewardsincludinggradesoracademicopportunities,higher-achievingstudentsmayalsoreducetheireffortchoicesinthepresenceofgreateracademiccompetition.Endogenousabilitysortingintoclassroomsismostlikelysourceofbiasinpeereffectsesti-matesgeneratedfromobservationaldata.Worse-performingstudentsmaybemorelikelytosortintoclassroomswithhigherproportionsoflow-abilitystudents,suggestingnegativebias.Ontheotherhand,ifschoolseffectivelysortstudentsintoclassroomswhichmaximizetheirpotentialtestscoresbyadaptingpedagogicalpractices,ahigherproportionoflow-performingstudentsmaysig-137nalaremedialorothercurriculum-adaptedclasswhichcanincreasealow-performingstudent'stestscore.Conditioningonown8th-gradetestscoresaccountsforsomeofthissorting,butislim-itedbymeasurementerrorinthescoresandthepotentialfordown-trendingperformancefrom8thgradetohighschool.Theinclusionofcourselevelandteacheredeffectsalsoaccountforsomesortingbutsomeunobserveddifferentiationofclassroomswithincourselevelsorteachersmaypersist.Forexample,ateachermayteachcoursesforhighabilityandforlowabilitystudentssep-aratelywhichhavethesamecoursecode(andhencethesamecourselevel),suchasAlgebraIandAlgebraIhonors.Furthermore,theremaybepeereffectsoperatinginacontextbroaderthantheclassroom.Peercompositioninaclassroommaybecorrelatedwithpeercompositioninothercoursesorinotheraspectsofschoollife(suchasthelunchperiodorafterschoolactivities),andmeaningfulacademicordisciplinaryspilloverscanoccurintheseothercontexts.Becausethesepotentialcorrelatedunobservablesarelikelytomanifestthemselvesacrossallofastudent'scoreclassrooms,Itestfortheirexistencethroughaseriesofclassroom-basedplacebotestsinthenextsection.2.2PlaceboTestsŒAlternateClassroomsIconductseveralplaceboteststoaddressthethreatofbiasinpeereffectsestimatesfromindi-viduals'unobservablecharacteristicsthatarecorrelatedwithpeerabilitygroupcomposition.Testscoresarehighlycorrelatedacrosshighschoolsubjects,suggestingthattheunderlyingcorrelationsofpeerabilitiesarealsolarge.Thus,sortingintoclassroomsonabilityislikelytobearelatedpro-cessacrosssubjects.Inpractice,itsmostdistinctivemanifestationisthetrackingofhighabilitystudentsintomultiplefihonorsflcoursesandlowabilitystudentsintomultipleremedialcourses.TheprimaryplacebotestistoregressstudenttestscoresinsubjectAonpeerabilitycomposi-tioninthecorrespondingclassroomsforsubjectAandsubjectB.ApositiveestimateforpeerabilitycompositioninsubjectBisevidenceofcorrelatedunobservableswhichmaybiastheestimatesforsubjectA'speercomposition.,forsomeunobservableaffectingtestscores,thecompositionsforbothAandBclassroomsarebothcorrelatedwiththeunobservable.138Thepartsofthecompositionsthatarecorrelatedwitheachotherarenotinthecoefcientestimates,buteachsubject'sclassroompeercompositionmayhaveapartthatisuncorrelatedwiththeothersubject'sbutcorrelatedwiththeunobservable.9Statisticallyplaceboes-timatesinthesamedirectionasthetreatmentestimatesareevidencethattheprimaryregressionresultsareatleastpartiallydrivenbyeithercorrelatedpeereffectsoccurringatalevelabovetheclassroomorabilitysortingintoclassroomsunaccountedforbycontrols.Section2.3providesaformaleconometricinterpretationoftheplacebotestforasingletreatment,showingthattheplacebocoefisproportionaltotheamountofomittedvariablesbiasineffectestimatesfromtheprimaryTable24showsestimatesforthelineareffectsoftreatmentandplaceboclassroommeanability,gendercomposition,andproportionretainedforAlgebraIIandBiology,usingtheclassroomsofthreeothercoresubjectsastheplaceboclassrooms.Mostoftheplacebocoefestimatesareandinthesamedirectionasthetreatmenteffect,indicatingthatunobservablecharacter-isticsorbroaderpeereffectsaredrivingtheeffectofpeermeanabilityforthetreatmentclassrooms.Thesamereasoningappliestoestimatesforbothclassroomgendercompositionandpercentageofstudentsretained.Figure14showsthepeerability-relativeabilitycoeffortheEnglishclassplaceboforAlgebraII,andFigure13istheinvertedversionofthesameplot.BothofthepatternsobservedinthetreatmenteffectplotsinFigures12and13arenotupheldintheEnglishplacebo,withmostestimatesnoisyandnarrowlydistributedaroundzerooropposite-signedtothetreatmenteffect.IrepeatthisexerciseusingscienceclassesinFigure16andsocialstudiesclassesinFigure17.Incontrastwiththemeanabilitycase,thisplaceboisevidenceinfavoroftheestimatednonlinearrelationshipbeingapproximatelyunbiasedbyunobservablemechanismsorcharacteristicsthatwouldberelatedacrossclassrooms,suchassortingandnon-classroompeereffects.Thebottomplotsof7-12aretheplacebotestsforthecorrespondingsubjects,using9Anevenstrongerplacebotestwouldbetorepeattheaboveregressionexcludingthetruetreatment;anullwouldstronglyindicatethatcorrelatedunobservablesarenotdrivingthemainresult.However,anycorrelationbe-tweenAandB'scompositionswouldcausepartofthetruetreatmenteffectofclassroomAtoappearinestimatesforB.139EnglishclassroomcompositionforallsubjectsfortheplacebotestexceptEnglish(whichusessocialstudiesclassroomcomposition).Inmostcases,anynonlinearpatterninpeerabilityisnotupheldintheplacebocase.Wherenonlinearpatternsarereplicatedintheplacebo,thereiscausetodistrusttheestimatedmagnitudeofthatspecportionofthenonlinearpeereffect.Forexample,forBiology,theeffectsofthesharesofstudentsinthe2ndand3rdabilitydecilesintheplaceboclassroomareslightlylargerthanthoseinthetreatmentclassroom,suggestingthatthosecoefareusingvariationinsharesassociatedwithsomeunobservedsortingpro-cess.Thiswouldariseiftherewereseveralclassesinthedatatargetedforandsuccessfulinraisingthetestscoresofstrugglingstudents.TheremainderoftheBiologydecilesshowamonotoni-callyincreasingnonlinearpattern,whiletheplaceboestimatesaresmallinmagnitudeandtightlyclusterednearzero,underscoringthecredibilityoftheestimatesforthosepeerabilitydeciles.Insomecases,thepatternoverrelativeabilityismirroredbytheplacebo.Forbiology,civics,andU.S.History,thoughtheaveragemagnitudeoftheeffectsissmaller,theplaceboestimatesshowadownward-slopingpatterninrelativeabilityinseveralcategories.Apossibleexplanationisapeereffectoccurringoutsideoftheclassroomwithasimilarheterogeneousresponseoverownabilityastheclassroompeereffect,suchasdisciplinaryspilloversorsomeotherculturaleffect.Anotherexplanationissorting,whichwouldrequirebothforesightandintentionalmanipulationofastudent'spositionintherelativeabilitydistributionintheclassroom,butislessplausibleinlightoftheserequirementscombinedwiththetypicallystudent-drivenclassroomselectioninhighschools.Thereisausefulinterpretationtothenonlineareffectcoefalone.Multipledistributionsofpeerabilitiescorrespondtothesamemeanability,buttherearepotentiallyrankingsoftestscoreproductionoverthosedistributionsofpeers.Forafunctionofformy=xl+f(x)b,themarginaleffectofincreasingxisdy=dx=l+f0(x)b.Supposethatxrepresentsthelinear(meanability)peercomponentoftheeducationalproductionfunctionandf(x)acontinuous,pos-itivenonlinearcomponent(e.g.,x2).Assumingl0(i.e.,testscoresareweaklyincreasinginpeerabilityforatleastsomepartofthedomainofability),thenf0(x)b