THREE ESSAYS ON SEMIPARAMETRIC ESTIMATORS
                           By
                    Benjamin Miller
                  A DISSERTATION
                       Submitted to
               Michigan State University
       in partial fulfillment of the requirements
                    for the degree of
          Economics – Doctor of Philosophy
                          2023


                                            ABSTRACT
In this dissertation, I develop two semiparametric estimators and consider a variation on an exist-
ing semiparametric estimator in the third chapter. In Chapter 1, I develop a model and an estimator
for a panel data setting with multiple fractional response variables with a binary endogenous co-
variate. I develop a two-step technique to obtain consistent estimates of the average partial e↵ects.
Then, I provide a variable addition test for endogeneity. I demonstrate using simulations that if the
chosen conditional mean function is incorrect, it is still possible to obtain estimates of the average
partial e↵ects that are close to the true values. Data from the NLSY97 survey is used to estimate
the average partial e↵ect of marriage on how individuals allocate their time within a year. In Chap-
ter 2, I develop a doubly-robust estimator of the quantile treatment e↵ect on the treated (QTT).
This estimator can obtain consistent estimates of the QTT using either the propensity score or the
conditional cdf of the first-di↵erenced untreated outcomes. Aside from the benefits of obtaining
consistent estimates of a QTT when a nuisance function is misspecified, there are also efficiency
gains. In addition, assumptions on the smoothness of the nuisance parameters can be relaxed when
the estimator is doubly-robust. I also show that asymptotically valid confidence intervals can be
constructed using the empirical bootstrap. Then, I demonstrate via simulations that my estimator
can produce a sharply lower root mean square error compared to other estimators. I apply my esti-
mator to estimate the e↵ect of increasing the minimum wage on county-level unemployment rates,
where I show significant and varied quantile treatment e↵ects. In Chapter 3, I consider whether
a modification of the parametric estimators of the nuisance functions described in Chapter 2 will
lead to an improved performance compared to existing estimators. In particular, an additional mo-
ment will be included to estimate the parameters of the nuisance functions, but only one of those
nuisance functions will be used to estimate the quantile treatment e↵ect on the treated (QTT). I
show that even if this additional moment is applied, the small-sample performance of the estimator
is not improved over the doubly-robust estimator in Chapter 2. This is true regardless of which
nuisance function is misspecified.


Copyright by
BENJAMIN MILLER
2023


To my parents and grandparents, who gave me an education.
                           iv


                                     ACKNOWLEDGEMENTS
There are many people that I would like to thank, starting with my family. I am grateful for the
support provided by my parents, Martin and Barbara Miller. They made it possible for me to
become an economist. I would also like to thank my siblings, Heather Katcher and Edward Miller,
for additional encouragement. No member of my immediate family could have had any success
were if not for my grandparents, David and Mildred Miller. I owe my education to them as much
as anyone else. They both passed away before they could see me earn my doctorate, but I am
certain that they would have been proud of my accomplishments.
     Next, I would like to some faculty and sta↵ at Michigan State University. I am grateful for the
guidance of my committee members Dr. Je↵rey Wooldridge, Dr. Antonio Galvao, Dr. Kyoo Il Kim,
and Dr. Nicole Mason-Wardell. They have provided feedback and advice not only on the technical
content of the dissertation, but also on the writing of the dissertation to make it more accessible to
an audience outside of econometrics.
     There are several classmates that I should thank. Without having to pick and choose who
gets an acknowledgement, I will begin by thanking my cohort. Each of them enlivened my time at
Michigan State in some small way; however, I did promise an acknowledgment to one classmate
in particular. I am grateful to Steven Wu-Chaves. I partially got the idea for my job market paper
from a conversation with him. Some people might call this an "inspiration." If that’s what it is, then
I can only hope that I have inspired him.
                                                   v


                              TABLE OF CONTENTS
CHAPTER 1    ECONOMETRIC METHODS FOR MULTIPLE FRACTIONAL
             RESPONSE VARIABLES WITH A BINARY ENDOGENOUS
             COVARIATE: AN APPLICATION TO TIME-USE DATA . . . . . . . .                       1
CHAPTER 2    DOUBLY-ROBUST QUANTILE TREATMENT EFFECT
             ESTIMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
CHAPTER 3    APPLICATION OF ADDITIONAL MOMENTS TO QUANTILE
             TREATMENT EFFECT ESTIMATION: A SIMULATION . . . . . . . . 48
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
APPENDIX A    DERIVING THE AVERAGE PARTIAL EFFECTS . . . . . . . . . . . 59
APPENDIX B    SIMULATION TABLES FOR CHAPTER 1 . . . . . . . . . . . . . . . 61
APPENDIX C    APPLICATION TABLES FOR CHAPTER 1 . . . . . . . . . . . . . . 65
APPENDIX D    HIGH-LEVEL ASSUMPTIONS AND PROPOSITIONS FOR
              NUISANCE FUNCTION ESTIMATION IN CHAPTER 2 . . . . . . . 68
APPENDIX E    PROOFS OF MAJOR THEOREMS AND PROPOSITIONS FOR
              CHAPTER 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
APPENDIX F    CHAPTER 2 TABLES AND FIGURES . . . . . . . . . . . . . . . . . 94
APPENDIX G    CHAPTER 3 FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . 97
                                            vi


                                           CHAPTER 1
        ECONOMETRIC METHODS FOR MULTIPLE FRACTIONAL RESPONSE
 VARIABLES WITH A BINARY ENDOGENOUS COVARIATE: AN APPLICATION TO
                                        TIME-USE DATA
1.1   Introduction
    Fractional response variables are often treated as having a linear conditional mean. This can
be done for the purpose of demonstrating that there is a strong association between the fractional
random variable and some covariates of interest. The largest contrast to this would be to take a
structural approach and try to derive the exact form of the conditional mean function. The deci-
sion to use a fractional response model can serve as a point between a linear approximation and
a structural approach. When there is a single fractional response variable of interest, the use of a
nonlinear conditional mean function can serve to provide a closer approximation to the true re-
lationship between the covariate and the conditional mean compared to a linear function. When
there are multiple fractional response variables of interest and those variables represent shares in
a bundle, then the linear conditional mean function no longer reflects the choices of an agent.
If a structural approach is to be taken, then the structural assumptions that might have reflected
an agent’s decision problem in a single time period no longer hold over multiple periods. There
could be heterogeneity amongst agents that influences their decisions. More assumptions or a more
complex structural approach might be needed to identify the parameters of interest.
    In this chapter, I develop a reduced form estimator of the average partial e↵ects for a fixed
number of multiple fractional responses. First, I combine the methods of Mullahy (2015) and Papke
and Wooldridge (2008) to develop a panel data estimator with a binary endogenous explanatory
variable (EEV). This is done using quasi-maxiumum likelihood estimation (QMLE), combining
this with a Mundlack-Chamberlain device, and then integrating out the endogenous component in
an approach that is similar to Heckman (1979). My approach is similar to that found in Nam (2014),
which handles the case of multiple fractional responses with a continuous endogenous covariate.
In that paper the author applies a two-step approach, using a control function in the first step to
                                                   1


project the omitted variable onto the error arising from the reduced form model of the continuous
EEV. In the second step, QMLE is then applied to estimate average partial e↵ects (APE).
    Then, I show the asymptotic properties of the estimates of the average partial e↵ect. Although it
may not be possible to recover the unscaled coefficients on the covariates, it is possible to estimate
and perform inference on the average partial e↵ects.
    I develop a variable addition test (VAT) for endogeneity. This test is based upon a simpler
version of the VAT found in Lin and Wooldridge (2017). The methods in this chapter can be
computationally intense to implement. The test allows for detection of endogeneity across all of
the fractional response variables.
    I also show that if one of the key identification assumptions fails, it is still possible to obtain
estimates of each APE that are reasonably close to the true values. I break this down into several
cases, looking at when the distribution of the unobservables are incorrectly specified, thus causing
the conditional mean to be incorrectly specified. The cases include when the unobservable variables
are asymmetric.
    Finally, as an application of my estimator I use data from the 1997 National Longitudinal Sur-
vey of Youth (NLSY97). This is done to estimate how changes in a binary endogenous variable
a↵ect the fraction of an individual’s time within a year devoted to work, sleep, and leisure. The
marital status of each survey participant is reversible, and the frequency of sexual intercourse by
survey participants in a year is used as an instrument coupled with controls that might be correlated
with this frequency. This specific data and the problem of estimating how marital status a↵ects in-
dividuals’ use of their time is explored within this chapter, while demonstrating how the fractional
nature of the dependent variable can be exploited.
    The chapter is organized as follows. Section 1.2 presents the model and identification assump-
tions. Section 1.3 presents the proposed method for obtaining consistent estimates of the average
partial e↵ects. Section 1.4 presents the average partial e↵ects, as well as their asymptotic distri-
bution. Section 1.5 presents a VAT for endogeneity. Under the null hypothesis of this test there is
no binary EEV, so that the parameters can be estimated using a standard multinomial logit QMLE
                                                   2


framework. Section 1.6 presents simulation results under model misspecification. Under the model
misspecification, the multinomial logit specification is incorrect, yet the results show that proposed
method will still provide a good approximation of the average partial e↵ects. Section 1.7 contains
the empirical application of this estimator to the 1997 National Longitudinal Survey of Youth.
Section 1.8 concludes the chapter and I consider alternatives to the estimator presented in this
chapter.
1.2    Model and Assumptions
    I assume that I have a panel of data consisting of a random sample of N subjects across T time
periods with L fractional dependent outcomes, with each dependent outcome denoted as yitl . Let
xi = (xit , ..., xiT ) and zi = (zit , ..., ziT ) I also assume strict exogeneity of the structural conditional
mean. xit is a 1x K vector of covariates, separate from the 1 x M vector of instruments zit . tit is
the binary EEV. eit represents omitted variables that change over time and with each subject i. The
structural conditional mean is,
                 E[yitl |xi , ti , zi , ci , ei ] = E[yitl |xit , tit , ci , eit ] = G(⇠tl + xit   l + tit ↵l + l ci + eit ) (1.1)
where
                                                 tit = 1[zit   z  + xit     x + z̄ z̄ + x̄ x̄ + uit    0]                    (1.2)
                                                 ci = hi ⇡ + ai                                                              (1.3)
                                                 hi = (x̄i z̄i t̄i )                                                         (1.4)
                                                  0  G(⇠tl + xit        l  + tit ↵l + l ci + eit )  1                      (1.5)
                                                ritl B   l ai + eit                                                          (1.6)
                                        XL
                                              Gitl = 1                                                                       (1.7)
                                        l=1
    and ⇠tl is a time-varying intercept, where uit is independent of xi , ai , and zi . This is denoted
by uit ?? xi , zi , ai . G is what I call the structural mean function. It is unknown to the researcher.
It is assumed that the distribution of uit , D(uit ), is known to the researcher. For example, the re-
searcher would know that uit ⇠ Normal(0, 1), and so ˆ would be obtained from a pooled probit
                                                                           3


MLE. Equation (1.3) represents the use of the Chamberlain (1980) device to write the correlated
random e↵ect ci as a projection onto the time-averaged values of the explanatory variables and the
                                                                                                       P         P
instruments. hi is the vector of these time-averaged values, where x̄i = Tt=1 xTit , z̄i = Tt=1 zTit , and
      P
t̄i = Tt=1 tTit . Equation (1.5) represents the restriction that the value of the conditional mean for
any values of xit , tit , ci , and eit must be between zero and one, inclusive. Equation (7) represents
the constraint that since each yitl is bounded between zero and one and must sum to one, then the
structural mean values must sum to one for each time period and subject. This constraint rules out
the application of a probit model when L                      3, since there is no guarantee that the choice of a probit
conditional mean function would satisfy the constraint. In Equation (1.6), ritl is being defined as
the sum of       l multiplied by the error from the projection of ci on hi plus eit . It is important to note
that though is tempting to write yitl as equal to G(⇠tl + xit l + tit ↵l + l ci + eit ), this may rule out the
case in which yitl takes the values 0 or 1. It should also be noted that, in contrast to Becker (2014),
the correlated random e↵ects have di↵erent coefficients across l. This is similar to how previous
approaches have handled a single source of heterogeneity in multinomial logit model (see section
16.2.4 of Wooldridge (2010)).
     Now, let,
                                                            ritl = ⇣l uit + vit                                     (1.8)
In e↵ect, I am noting that ritl can be written as a linear function of uit and vit , where vit is the er-
ror arising from regressing ritl on uit .This extends naturally from the model. Since the correlated
random e↵ects approach eliminates inconsistency from the explanatory variables, then any endo-
geneity that arises must be through ritl due to uit . Then, vit ?? xit , zit , uit . I am implying that if the
unobservables are projected onto the random variable that is the source of the endogeneity, then
whatever remains should be random noise. I am also going to assume that,
                                                                                                 v
                          E(yitl |hi , xi , zi , ti , ui ) = E(yitl |hi , xit , tit , uit ) = ⇤(ditl + uit ⇣lv )    (1.9)
where
                                               ditl B ⇠tl + xit      l + tit ↵l + l hi ⇡                           (1.10)
                                                                    4


,
                                                 v
                                              ditl  B ⇠tlv + xit     v
                                                                     l  + tit ↵vl +         v
                                                                                            l hi ⇡
                                                                                                  v
                                                                                                                      (1.11)
,and The multinomial logit function that is of interest here is,
                                                                                   v         v
                                                 v                              editl +uit ⇣l
                                          ⇤(ditl    + uit ⇣lv ) =            PL 1 (dv +⇣ v uit )                      (1.12)
                                                                   (1 + k=1 e itk                   )
Equation (1.11) represents the scaled valued of ditl . Equation (1.9) represents the assumption that
once vit is integrated out, then the conditional mean function is equal to the multinomial logit
conditional mean function, and the values of the parameters are related, though it is unknown
exactly how, to the original structural parameters. The first equality of equation (1.9) represents
the strict exogeneity assumption. The original structural parameters cannot be identified, but these
transformed parameters can be used to estimate the APEs. While I have not made any direct as-
sumptions about the distribution of vit , I have made a somewhat strong assumption, though one
that is not out of place in the econometrics literature. In e↵ect, I am placing a restriction upon the
combination of G and the random variable vit such that (1.9) holds; however, it is not known what
restrictions must be placed upon G and vit so that (1.9) is true. It could also be that the original
structural model is incompatible with (1.9), but (1.9) is assumed to hold for the sake of conve-
nience. This is analogous to assumptions in the generalized estimation equations (GEE) literature,
where when heterogeneity is averaged out, a convenient functional form is chosen as in Zeger,
Liang, and Albert (1988). Petrin and Train (2010) apply this approach in the context of consumer
choice by dividing the structural error in the consumer utility function into two parts. Distributional
assumptions are made on each part in order to form a mixed logit conditional mean function. A
version of this assumption is guaranteed to hold for any fractional variable, and the method that I
am proposing can still be applied. As noted by Mullahy (2015) from Woodland (1979), it is always
the case for fractional response variables that the conditional mean function should have a form
that is based upon an underlying Dirichlet random variable. Then the conditional mean has the
form,
                                                             zl (xit , zit , tit , uit ) zL (xit , zit , tit , uit )
                   E[yitl |xit , zit , tit , uit ] =        PL 1                                                      (1.13)
                                                      1 + l=1 zl (xit , zit , tit , uit ) zL (xit , zit , tit , uit )
                                                                      5


                                                                                                                      v     v
When (12) is combined with (13), this is equivalent to setting zl (xit , zit , tit , uit ) = editl +uit ⇣l .
    The previous assumptions are summarized as follows:
Assumption D.1.
             E[yitl |xi , ti , zi , ci , ei ] = E[yitl |xit , tit , ci , eit ] = G(⇠tl + xit     l + tit ↵l + l ci + eit )
                                          tit = 1[zit     z  + xit    x + z̄ z̄ + x̄  x̄ + uit      0]
                                          ci = hi ⇡ + ai
Assumption D.2.
                                          0  G(⇠tl + xit         l  + tit ↵l + l ci + eit )  1
                                          X L
                                               Gitl = 1
                                          l=1
                                                                                                 v
Assumption D.3. E(yitl |hi , xi , zi , ti , ui ) = E(yitl |hi , xit , tit , uit ) = ⇤(ditl          + uit ⇣lv )
Assumption D.4. uit ?? xi , zi , ai , where D(uit ) is known.
Assumption D.5. E[eit |xi , zi ] = 0, and                  z , 0).
    Though Assumption D.3 is not out of place in the econometrics literature, it can be tested. For
                                                                    v      v
example, a choice could be made between editl +uit ⇣l and zl (xit , zit , tit , uit ). Then, a test for misspec-
ified moments based upon Rivers and Vuong (2002) can be applied. This is important because
equation 12 always holds, so Assumption ID.3 can be tested using the Rivers and Vuong test, and
an alternative model based upon such a test could be provided if the assumption does not hold.
    Now, given the previous assumptions of the model the following is obtained:
                                                                        Z 1
                                                                                    v
                               E(yitl |hi , xit , zit , tit = 1) =             ⇤(ditl  + uit ⇣lv ) f (uit )duit                (1.14)
                                                                             q
                                                                       Z qitit
                                                                                    v
                            E(yitl |hi , xit , zit , tit = 0) =                ⇤(ditl  + uit ⇣lv ) f (uit )duit                (1.15)
                                                                          1
where qit = zit z + xit     x  + z̄    z̄ + x̄ x̄ .   x,     z , and the corresponding coefficients on the time average
variables denote the scaled coefficients of x, z, z̄, and x̄ in the Bernoulli MLE problem, where, for
a given parametric assumption on uit , P(tit = 1|xit , zit , z̄, x̄) = g(xit , zit , x̄, z̄;                   x, z , x , z ).
                                                                        6


1.3       Estimation Method
     The estimator will be derived from a QMLE problem by pooling across time for each subject.
The serial dependence across {yitl } is unrestricted. Based upon (14) and (15), the following QMLE
problem is,
                                                                               N X   T 
                                                                                              0L 1                  Z                                           1
                                                                             X                BBBBX                   1                                         C
                                                                                                                                                                C
                                             max                                         tit B@ yitl log                       v
                                                                                                                          ⇤(ditl +    uit ⇣lv ) f (uit )duit CCCA
                  v 2RK⇥L ,⇡2RK+M+1 ,↵v 2R,              v 2RL ,⇠ v 2RT ⇥L
                                                l                             i=1 t=1             l=1                 qit
                          0        L 1
                                                     1      Z                  L 1
                          BB       X                 CCC            1          X
                 + tit BBB@1                  yitl CCA log              (1                v
                                                                                     ⇤(ditl      + uit ⇣lv )) f (uit )duit
                                    l=1                             qit         l=1
                                      0L 1                   Z                                                   1
                                      BBB  X                           qit                                       CC
                 + (1         tit ) BB@ yitl log                           ⇤(ditl + uit ⇣l ) f (uit )duit CCCA
                                                                                 v          v
                                            l=1                     1
                                        0         L 1
                                                                1          Z qit         L 1
                                        BBB      X              CCC                      X
                                                                                                       v
                 + (1         tit ) BB@1                 yitl CCA log              (1             ⇤(ditl   + uit ⇣lv )) f (uit )duit )
                                                  l=1                        1           l=1
The maximization problem represents the maximization of the likelihood of a multinomial ran-
                                                                                                         R1                                              R qit
                                                                                                                       v
dom variable, where choice l is made with probability q ⇤(ditl                                                            + uit ⇣lv ) f (uit )duit or 1 ⇤(ditl      v
                                                                                                                                                                      +
                                                                                                              it
uit ⇣lv ) f (uit )duit ,     given the covariates and the observed value of tit . The above objective function
handles the endogenous switching problem that arises from the endogenous binary variable. This
method is applied by Wooldridge (2014) in the context of a probit conditional mean function when
there exists two fractional dependent variables and a binary endogenous variable.
     A just-identified GMM system can be set up using the score functions based upon the QMLE
problem. Let,
                                                                            ✓ = (⇠tlv0 ,  v0
                                                                                          l , ↵l ,
                                                                                                    v0    v0 v0 v0
                                                                                                          l , ⇡ , ⇣l )                                            (1.16)
                                                                              =( z,      x , x̄ , z̄     )                                                        (1.17)
                                                                                           7


                                                                          2                                                     3
                                                                          666                                                   7
                                                                            666 s i ( ) 77777
                                                                              666                                               7
                                                                                666 s v (✓; ) 77777
                                                                                  666 ⇠ i                                       777
                                                                                    666                                           7
                                                                                      666 s v (✓; )77777
                                                                                        666 i                                     777
                                                                                          666                                       7
                                                        i (✓, ) = 6
                                                                                            66 s↵v i (✓; )77777                                                                                                    (1.18)
                                                                                             666                                    777
                                                                                               666                                    7
                                                                                                 666 s v i (✓; )77777
                                                                                                   666                                777
                                                                                                     666                                7
                                                                                                       666 s⇡v i (✓; )77777
                                                                                                         666                            777
                                                                                                           646                            77
                                                                                                               s⇣ v i (✓; ) 5
For example, if I were to assume uit follows a normal distribution and using a logit specification
for the conditional mean function, the score functions are
                             X   T
                                       (zit xit z̄i x̄i )> ((zit xit z̄i x̄i ) > )[tit                                                                                    ((zit xit ) > )]
                    s i( ) =
                              t=1
                                                           ((zit xit z̄i x̄i ) > )[1                                                                  ((zit xit z̄i x̄i ) > )]
                                               0                          R1                                                                                                                           1
                                                                                                                              v
                             X   T  B X
                                               BBB L 1                                                          qit
                                                                                                                       M(ditk                + ⇣kv uit , ditl    v
                                                                                                                                                                     + ⇣lv uit ) f (uit )duit CCC
               s⇣lv (✓; ) =             tit BBB@                     yitk                                               R1                                                                             CCC
                                                                                                                                                   v             v                                       CA
                              t=1                     k=1,k,l                                                               qit
                                                                                                                                           ⇤(ditk + ⇣k uit ) f (uit )duit
                                        R1
                                                                v
                                           qit
                                                        ⇤0 (ditl        + ⇣lv uit )uit f (uit )duit
                           + tit yitl R 1
                                                                      v
                                                      qit
                                                           ⇤(ditl        + ⇣lv uit ) f (uit )duit
                                   0                               1                                                R1
                                                                                                                                             v
                                   BBB           L 1
                                                 X                 CCC                                                 qit
                                                                                                                           P(ditl               + ⇣lv uit )uit f (uit )duit
                           + tit BB@1                       yitk CCA R 1                                                   PL 1                          v            v
                                                 k=1                      qit
                                                                                                                  (1           k=1 ⇤(ditk + ⇣k uit )) f (uit )duit
                                                           R qit
                                                                                                                   v
                                                              1
                                                                        ⇤0 (ditl                                       + ⇣lv uit )uit f (uit )duit
                           + (1 tit )yitl R qit
                                                                                                                     v
                                                                     1
                                                                         ⇤(ditl                                         + ⇣lv uit ) f (uit )duit
                                                  0 L 1                                                      R qit                                                                                           1
                                                                                                                                           v           v            v         v
                                                  BBB X                                                                 M(d                itk  +   ⇣  k u  it ,  d itl +   ⇣l  u it ) f  (u     it )du   it CCC
                           + (1 tit ) BB@B                            yitk 1 R qit                                                                                                                             CCC
                                                                                                                                                                                                                 A
                                                                                                                                                      v            v
                                                       k=1,k,l                                                                1
                                                                                                                                              ⇤(ditk + ⇣k uit ) f (uit )duit
                                                    0                                                           1            R qit
                                                                                                                                                          v
                                                    BBB        L 1
                                                               X                                                CCC                        1
                                                                                                                                                 P(ditl        + ⇣lv uit )uit f (uit )duit
                           + (1 tit ) B@1             B                  yitl CA R qit                            C                              PL 1                 v          v
                                                               k=1                                                      1
                                                                                                                             (1                       k=1 ⇤(ditk + ⇣k uit )) f (uit )duit
while for each intercept,
                              0                             R1                                                                                                                            1
                                                                                                                  v          v                    v            v
                              BBB X   L 1
                                                               qit
                                                                        M(d                                       itk  +   ⇣k  u           it , d itl  +    ⇣  l u it ) f (u it )du    it CCC
               s⇠tl (✓; ) = BB@
                   v
                                B
                                B                yitk                    R1                                                                                                                 CCC
                                                                                                                            v                    v                                            CA
                                  k=1,k,l                                                                      q
                                                                                                                     ⇤(ditk + ⇣k uit ) f (uit )duit
                                                                                                                  it
                                                                                                                           8


                                       R1
                                                         v
                                          qit
                                                   ⇤0 (ditl     + ⇣lv uit ) f (uit )duit
                         + tit yitl R     1             v
                                            qit
                                                   ⇤(ditl       + ⇣lv uit ) f (uit )duit
                                 0                          1                 R1
                                                                                              v
                                 BBB        L 1
                                            X               CCC                  qit
                                                                                     P(ditl      + ⇣lv uit ) f (uit )duit
                         + tit BB@1                  yitk CCA R 1                   PL 1               v            v
                                             k=1                   qit
                                                                          (1            k=1 ⇤(ditk + ⇣k uit )) f (uit )duit
                                                    R qit
                                                                           v
                                                       1
                                                                ⇤0 (ditl       + ⇣lv uit ) f (uit )duit
                         + (1 tit )yitl R qit
                                                                          v
                                                                ⇤(ditl         + ⇣lv uit ) f (uit )duit
                                             0 L 11 R qit                                 v           v           v        v
                                                                                                                                                      1
                                             BBB X                               M(d      itk  +  ⇣  k  u it ,  d itl + ⇣  l  u  it ) f (uit )du   it CCC
                         + (1 tit ) BB@B                       yitk 1 R qit                                                                             CCC
                                                                                                                                                          A
                                                                                                     v           v
                                                  k=1,k,l                             1
                                                                                             ⇤(ditl + ⇣l uit ) f (uit )duit
                                               0                        1                R q
                                                                                                          v
                                               BBB      L 1
                                                        X               CCC                   1
                                                                                                 P(ditl        + ⇣lv uit ) f (uit )duit
                         + (1 tit ) B@1          B                        C
                                                                  yitl CA R qit                 PL 1                v           v
                                                        k=1                      1
                                                                                     (1             k=1 ⇤(ditk + ⇣k uit )) f (uit )duit
Note that
                                                                             v     vu )           PL        1 (ditk v +⇣ v u )               v     vu )
                                    0   v            v              e(ditl +⇣        it
                                                                                         (1 +          k=1 e
                                                                                                                             it
                                                                                                                                 )      e2(ditl +⇣        it
                             ⇤        (ditl  + ⇣ uit ) =                                              PL 1 (dv +⇣ v uit ) 2
                                                                                            (1 + k=1 e itk                          )
                                                                                v +⇣ v u )
                                                                             (ditl              (d v +⇣ v u )
                     v                                                     e             it
                                                                                            (e    itk         it
                                                                                                                  )
                M(ditk + ⇣ v uit , ditl v
                                             + ⇣ v uit ) =                       PL 1 (dv +⇣ v uit ) 2
                                                                    (1 + k=1 e itk                              )
                                                                                           v +⇣ v u )
                                                                                        (ditk
                                        v                                       ( e                 it
                                                                                                        )
                              P(ditk         + ⇣ v uit ) =                       PL 1 (dv +⇣ v uit ) 2
                                                                    (1 + k=1 e itk                              )
     These scores, along with the scores that are used to estimate , are used to generate correspond-
ing sample moments. Now, the standard regularity assumptions from Newey and McFadden (1994)
are made.
Assumption E.1.        ⇥ ⇥ is compact.
                                                                                                                                                             N
Assumption E.2. {(xi1 , ..., xiT ), (zi1 , ..., ziT ), (ti1 , ..., tiT ), [(yi11 , ...yilL ), ..., (yiT 1 , ..., yiT L )]}i=1
is iid
Assumption E.3. There exists a unique ( 0 , ✓0 ) 2                                          ⇥ ⇥ such that E( i (✓0 ,                       0 ))  = 0.
Assumption E.4. E( sup || i (✓, )||) < 1
                        ⇥✓2 ⇥⇥
Assumption E.5. E(|| i (✓0 ,                 2
                                       0 )|| )      < 1 and E(|| @@(i (✓,✓)0 ,> 0 )) < 1
                                                                                   9


     Then,
                                                     2             3
                                            p        666 ˆ         777
                                                       6             777
                                               N 66666                 777 ! N(0, (D> D) 1 D> D(D> D) 1 )
                                                       4✓ˆ       ✓5
                      i (✓0 , 0 )
where D = E[ @         @✓>
                                  ] and         = E[ i (✓0 ,                                     >
                                                                                  0 ) i (✓0 , 0 ) ].   The asymptotic variance can be estimated
                   PN                ˆ ˆ)                                                                                     PN
using D̂ = N     1
                         i=1
                                @ i (✓,
                                  @✓>
                                            as a consistent estimator for D and ˆ = N 1 i=1                                       [ i (✓, ) i (✓, )> ] as a
consistent estimator for .
1.4    Average Partial E↵ects
     In order to separate out the average partial e↵ect of the continuous variables from the discrete
                                                                 j
variable, let ⌥it = (hi , xit , zit ). Let                       (·)      denote element j corresponding to the continuous covariate
(·). Then (see Appendix A),
         Euit [E[yitl |⌥, tit = 1, uit ]                  E[yitl |⌥, tit = 0, uit ]]
                                                          Z                                                      Z
                                                 =                ⇤(dtlv,t=1 + ⇣ v uit ) f (uit )duit                 ⇤(dtlv,t=0 + ⇣ v uit ) f (uit )duit
                                                                                   0
                                                 = Euit [                  itl (x    , h0 ; ✓)]
                                                 = ⌃tl (x0 , h0 )
                                        dtlv,t=1 = ⇠tlv + x0                    v
                                                                                l  + ↵vl +      v 0 v
                                                                                                lh ⇡
                                        dtlv,t=0 = ⇠tlv + x0                    v
                                                                                l  +    v 0 v
                                                                                        l hi ⇡
                                   0     0
                             tl (x , h ; ✓)      = ⇤(dtlv,t=1 + ⇣ v uit )                       ⇤(dtlv,t=0 + ⇣ v uit )
Similarly, the average partial e↵ect for a continuous explanatory variable is,
                                                                           "Z                                                 #
                                                           j                                v                j
           Euit [@E[yitl |⌥, tit , ũit ]/@                (.) ]  =                 @⇤(ditl   +  ⇣lv uit )/@ (.) f (uit )duit   = ⌅tl (x0 , h0 , t0 )
                                                                                R
In either case, the estimator is similar to the Average Structural Function (ASF) of Blundell and
Powell (2001). The expected value of the marginal change in the explanatory variable is taken with
respect to the distribution of the unobserved error, the source of the endogeneity that is still present.
This leads to the following two theorems,
                                                                                           10


Theorem 1. Suppose assumptions D.1-D.5 and E.1-E.5 hold and
                                                           j
E( sup ||r ,✓ @(⇤(dtlv0 + ⇣ v uit )/@                     (.) )||) < 1. Then,
      ⇥✓2 ⇥⇥
                                                  p
                                                    N(⌅ˆtl (x0 , h0 , t0 )      ⌅tl (x0 , h0 , t0 )) ! N(0, Vtl )
                                                   PN
where ⌅ˆtl (x0 , h0 , t0 ) = N 1 [ i=1                  @⇤(dˆtlv0 + ⇣ˆlv uit )/@      j
                                                                                      (.) ],
  v0
ditl   = ⇠tlv + x0         v
                           l + tit0 ↵vl +       v 0 v
                                                lh ⇡ ,
dˆitl
  v0
       = ⇠ˆtlv + x0 ˆvl + tit0 ↵ˆvl + ˆvl h0 ⇡ˆv
Vtl = E[Vtli> Vtli ]
                                             j                                                                   j
Vtli = @⇤(dtlv0 + ⇣ v uit )/@                (.) )   ⌅tl (x0 , h0 , t0 ) + E([r ,✓ (@⇤(dtlv0 + ⇣ v uit )/@      (.) )])Ki
Ki = B0 1 G> 0 i (✓0 ,                   0)
B0 = G> 0 G0
G0 = E[r ,✓ i (✓0 ,                0 )]
                                                                                                                               0   0
Theorem 2. Suppose assumptions D.1-D.5 and E.1-E.5 hold and E( sup ||r ,✓                                                 tl (x , h ; ✓)||) <
                                                                                                               ⇥✓2 ⇥⇥
1. Then,
                                                    p
                                                     (N)(⌃ˆ tl (x0 , h0 )        ⌃tl (x0 , h0 ))) ! N(0, Jtl )
                                            PN             ˆ                             ˆ + ⇣ˆv u )],
where ⌃ˆ tl (x0 , h0 ) = N               1
                                               i=1 [⇤(dtl
                                                          v,t=1
                                                                  + ⇣ˆv uit )    ⇤(dtlv,t=0          it
    ˆ = ⇠ˆv + x0 ˆv + ↵ˆv + ˆv h0 ⇡ˆv ,
dtlv,t=1         tl          l         l         l
dtlv,t=0 = ⇠tlv + x0         v
                             l +       v 0 v
                                       lh ⇡
                  >
Jtl = E[Jtli        Jtli ]
Jtli =              0     0
              tl (x , h ; ✓)          ⌃ˆ tl (x0 , h0 ) + [E(r ,✓               0    0
                                                                         itl (x , h ; ✓))]Ki
      Theorems 1 and 2 represent the average partial e↵ect taken while holding hi at some fixed
value. I have left the notation uit to emphasize that this estimator is particular to a specific time
period, and to emphasize a single uit is generated for each i. Usually, the average partial e↵ect is
taken over the distribution of the unobserved heterogeneity. This is done below. The computations
required to obtain this average partial e↵ect lead to the generation of multiple uik . In contrast to a
                                                                                  11


single uit drawn for each i, multiple draws must be made for each i. The average partial e↵ect when
the expectation is taken over the joint distribution of ci and eit , conditioning on their proxies hi and
ci at the fixed values x0 and t0 is,
                 Ehi ,uit [@E[yitl |x0 , t0 , hi , uit ]/@vitj ]
                      Z Z
                   =               @⇤(⇠tl + x0 l + t0 ↵l + l hi ⇡ + ⇣l uit )/@vitj f (ui |hi ) f (hi )dhi duit
                          R     R
                   = ⌦tl (x0t , tt0 )
The corresponding estimator is,
                                 X N X    R
                  N R  1      1
                                            @⇤(⇠ˆtl + x0 ˆ l + t0 ↵ˆ l + ˆ l hi ⇡ˆ + ⇣ˆl uitr )/@vitj = ⌦       ˆ tl (x0 , t0 )
                                  i=1 r=1
Similarly, the APE in this case for the binary EEV is
                          Z Z
                                     ⇤(⇠tl + x0it l + ↵l + l hi ⇡ + ⇣l uit ) f (uit |hi ) f (hi )dhi duit
                            R R
                          Z Z
                                     ⇤(⇠tl + x0 l + l hi ⇡ + ⇣l uit ) f (uit |hi ) f (hi )dhi duit
                            R      R
                                       0
                          =      tl (x   )
In this case, the corresponding estimator is,
               XN X   R
      N 1R   1
                          [⇤(⇠ˆtl + x0 ˆ l + ↵ˆ l + ˆ l hi ⇡ˆ + ⇣ˆl uitr )           ⇤(⇠ˆtl + x0 ˆ l + ˆ l hi ⇡ˆ + ⇣ˆl uitr )] = ˆ tl (x0 )
               i=1 r=1
Next, I will present the asymptotic distribution of these APEs. First, I will apply a mean value
                p
expansion to N⌦tl (x0 , t0 ). Let
                                                      X N
                    p
                      N⌦   ˆ tl (x0 , t0 ) = p1            [ f˜R (x0 , t0 , hi ; ✓0 ) + r f˜R (x0 , t0 , hi ; ✓)(
                                                                                                               ˜ ✓ˆ ✓0 )]                 (1.19)
                                                  N i=1
                                           P
Where f˜R (x0 , t0 , hi ; ✓) = R 1 Rr=1 @(⇤(⇠tl + x0 l + t0 ↵l + l hi ⇡ + ⇣ v uitr )/@ (.)                       j
                                                                                                                   ), and f (x0 , t0 , hi ; ✓) =
                                                              j
E[@(⇤(⇠tl + x0 l + t0 ↵l + l hi ⇡ + ⇣ v uitr )/@             (.) )|hi ]. Now, note that if R ! 1 along with N ! 1, then
                                                                                                                                   j
by the consistency assumptions and E( sup ||r ,✓ @(⇤(⇠tl + x0 l + t0 ↵l + l hi ⇡ + ⇣ v uit )/@                                    (.) )||) < 1,
                                                          ⇥✓2 ⇥⇥
                           N
                   1 X
                 p              r ,✓ f˜R (x0 , t0 , hi ; ✓)(ˆ
                                                         ˜ ⌘        ⌘0 ) = E[r ,✓ f˜R (x0 , t0 , hi ; ✓0 )]KN + o p (1)                   (1.20)
                     N i=1
                                                                        12


Then,
            p
              N(⌦ ˆ tl (x0 , t0 )      ⌦tl (x0 , t0 ))
                         N
                1 X ˜ 0 0
            = p              [ fR (x , t , hi ; ✓0 )      ⌦tl (x0 , t0 ) + E[r ,✓ f˜R (x0 , t0 , hi ; ✓0 )]Ki + o p (1)]
                  N i=1
                       PN        ˜ 0 0
Now, note that  p1
                   N      i=1 [ fR (x , t , hi ; ✓0 )       ⌦tl (x0 , t0 ) + E[r ,✓ f˜R (x0 , t0 , hi ; ✓)]Ki has mean 0. If this
is not immediately obvious, note that unlike in Hajivassiliou and Ruud (1994) the simulations are
being carried out directly upon the partial derivative of the multinomial logit function with respect
to some explanatory variable for each hi , as opposed to the simulations being carried out on the
                                                              PN ˜ 0 0                           PN
likelihood and then taking the gradient. p1N i=1                      [ fR (x , t , hi ; ✓0 )] = i=1    f (x0 , t0 , hi ; ✓0 ) + AN + BN ,
where
                                              N
                                       1 X ˜ 0 0
                            AN = p                [ fR (x , t , hi ; ✓0 ) Eu|h [ f˜R (x0 , t0 , hi ; ✓0 )]],                       (1.21)
                                         N i=1
                                              N
                                       1 X
                            BN = p                [Eu|h [ f˜R (x0 , t0 , hi ; ✓0 )] f (x0 , t0 , hi ; ✓0 )]                        (1.22)
                                         N i=1
By the definition of f˜R (x0 , t0 , hi ; ✓0 ) and f (x0 , t0 , hi ; ✓0 ), AN and BN both have zero expectation.
Then since E( f˜R (x0 , t0 , hi ; ✓0 ) = E[E[ f˜R (x0 , t0 , hi ; ✓0 |hi )]] = ⌦tl (x0 , t0 ), the central limit theorem
implies,
              p                                             N!1
                N(⌦    ˆ tl (x0 , t0 )    ⌦tl (x0 , t0 ))        ! N(0, Jtl )
                                                                      >
                                                       Jtl = E(Jtli      Jtli )
                           Jtli = f˜(x0 , t0 , hi ; ✓0 )       ⌦tl (x0 , t0 ) + E[r ,✓ f˜(x0 , t0 , hi ; ✓0 )]Ki
                                 f˜(x0 , t0 , hi ; ✓0 ) =@(⇤(⇠tl + x0 l + t0 ↵l + l hi ⇡ + ⇣ v uitr )/@               j
                                                                                                                      (.) )
Similarly,
                             p                                    N!1
                                N( ˆ tl (x0 )      ( tl (x0 ))        ! N(0, Ltl )
                                        >
                        Ltl = E(Ltli      Ltli )
                       Ltli = fdi↵ (x0 , hi ; ✓0 )           tl (x
                                                                   0
                                                                     ) + E[r ,✓ f˜di↵ (x0 , hi ; ✓0 )]Ki
                                                                      13


           fdi↵ (x0 , hi ; ✓0 ) = ⇤(⇠tl + x0 l + ↵l + l hi ⇡ + ⇣ v uitr )  ⇤(⇠tl + x0 l + l hi ⇡ + ⇣ v uitr )
    It important to note that even though it would appear that the estimators of the true APEs are
limited by the sample size, this is not the case. Since the distribution of uit is known, a researcher
could simply simulate uit enough until the researcher feels that the estimators are sufficiently close
to the true APEs. It is important to note that these theorems are true for a correctly specified model.
That is to say, when equation (1.9) holds, except for one special case, which will be explored in
the simulation section.
1.5    Test for Endogeneity
    The method described relies upon distributional assumptions that are separate from the correct
choice of the functional form for the conditional mean function. The distribution of uit would
have to be correctly chosen. Furthermore, the method itself is computationally intensive; however,
a variable addition test (VAT) based upon Wooldridge (2014) could serve as a test of whether
the variable tit is endogenous. Given the choice of for the distribution of uit , the test will utilize
a generalized residual as proposed in Gourieroux et al. (1987). The test will rely upon ⇣l = 0
for l = 1, ..., L         1. The VAT on the generalized residual will be shown in this section to be
asymptotically equivalent to Lagrange Multiplier test under the null hypothesis of no endogeneity.
                                                                                                                2
I will show that the Lagrange Multiplier(LM) test statistic has an asymptotic distribution of                   L 1,
under H0 : ⇣1 = ... = ⇣L         1 = 0. Then, I will show that the VAT is asymptotically equivalent to the
                                                                2
LM test, and therefore the VAT statistic also has a             L 1   distribution under H0 . It should be noted
that the specific test here is based upon Lin and Wooldridge (2017), though the LM test is not
infeasible, since there is no additional error arising from a continuous endogenous variable.
          vr                             v
    Let ditl denote the value of ditl      when ⇣l = 0 and ✓r be the estimate of ✓ based upon the restricted
model. Note that in the restricted model tit is exogenous. Let gr              ˆ it ( ˆ ) denote the estimate of the
generalized residual, which is to be used as a consistent estimator of gr B E(uit |tit , xi , zi ). Note
that I have written the estimate of the generalized residual as a function of ˆ to emphasize that
the estimates are a function of the estimated parameters from the latent variable equation for tit .
                                                         14


Consider the LM statistic, where the estimates of the restricted model will be plugged into the
score from the unrestricted model,
                                                               X N                                                    X N
                                        LM = ( S̃i,⇣ )> Ã22 [Ṽ22 ] 1 Ã22 ( S̃i,⇣ )/N                                                                                     (1.23)
                                                                i=1                                                    i=1
where
                                                @lnLi
                              S̃i,⇣ B                              r
                                                  @⇣ ✓=✓       ⇣=0
                                                        2                                                                                                     3
                                                        666PN                       @2 lnLi                            PN           @2 lnLi                   77
                                                                                 Ê( @✓✓> |tit , xi , zi ) ✓=✓r           i=1 Ê( @✓⇣ > |tit , xi , zi ) ✓=✓r 7
                                                  1 66666 i=1                                                                                            ⇣=0 7
                                                                                                                                                               77
                                                                                                                                                                777 ,
                                Ã B                      6                                                ⇣=0
                                                 N 66664PN Ê( @2 lnLi |t , x , z )                                    PN            2
                                                                                                                                Ê( @ lnLi |t , x , z )
                                                                                                                                                                  777
                                                                                                                                                                    75
                                                               i=1                   @⇣✓>    it   i    i ✓=✓r             i=1       @⇣⇣ >    it  i   i ✓=✓r
                                                                                                           ⇣=0                                           ⇣=0
                                         2                           3
                                         666Ã                Ã     77
                                  1        66 11
                                            6                    12 7 7777 ,
                             Ã     = 666                                775
                                            4Ã               Ã
                                                   21            22
                                            2                              3
                                            666Ṽ             Ṽ           777
                                              66
                                               6   11            12          7777 ,
            Ṽ = Ã 1 B̃Ã        1
                                    = 666                                       77
                                               4Ṽ            Ṽ 5
                                                   21            22
                                                          N
                                                 1X
                                B̃ B                           S̃i,⇣ S̃>i,⇣
                                                N i=1
In Ã, the expectation is taken with respect to uit . It is important to note the use of Ê(·)|tit , xi , zi ),
which is a function of gr          ˆ it ( ˆ ). While the distribution of uit |xi , zi is known, this does not imply
knowledge of the distribution of uit |tit , xi , zi . If this distribution was known, then each element of
Ã would be a function of the conditional expectations themselves, as opposed to estimators that
are functions of the generalized residuals. The summation represents the application of iterated
expectations. Applying the summation over i and dividing by N serves as a consistent estimator of
the unconditional expected value of the second partial derivatives of the likelihood function.
    Alternatively, the log likelihood function for an individual i when implementing the VAT is
                 T
                    20 L 1                                                               1 0           L 1
                                                                                                                   1        0        L 1
                                                                                                                                                                       13
                X   666BBBX                                                              CCC BBB       X           CCC      BBB      X                                 CC77
           Li =                                             v
                      664BB@ yitl log⇤(ditl + grit ⌧l )CCA + BB@1                                           yitl CCA log BB@1               ⇤(ditl + grit ⌧l )CCCA7775
                                                                                                                                                v
                t=1         l=1                                                                        l=1                            l=1
                                                                                               15


To implement the VAT, the procedure is as follows:
Procedure 5.1
   1. Generate the generalized residuals gr                      ˆ it from a first stage pooled maximum likelihood estima-
      tion (e.g. probit estimation) of tit on xit and zit . For example, the generalized residuals from
                                      [ (xit ˆ x +zit ˆ z )][tit                   (xit ˆ x +zit ˆ z )]
      probit estimation are grˆ it =          (xit ˆ x +zit ˆ z )(1               (xit ˆ x +zit ˆ z ))
   2. Obtain the maximum likelihood estimates of the parameters using the individual log-likelihood
      function Li .
   3. Obtain the Wald test statistic under H0 : ⌧1 = . . . = ⌧L                                                         1 =0.
   Under the null hypothesis, ⌧l = 0 for l=1,...,L-1, so the estimation of gr                                                           ˆ it does not a↵ect the
asymptotic distribution of the test statistic. The score vector is,
                                                                                             2 3
                                                                                             666 s 777
                                                                                               666 ⇠Li 777
                                                                                                 666 777
                                                                                                   666 s 777
                                                                                                     6 Li 7
                                                                       2 3 6666 7777
                                                                       666 @Li 777 666 s 777
                                                                         6 @✓ 7 66 ↵Li 77
                                                       Si,⌧ = 66666 77777 = 6666 7777
                                                                         4 @Li 5 666 s 777
                                                                                 @⌧                  666 Li 777
                                                                                                       666 777
                                                                                                         666 s 777
                                                                                                           666 ⇡Li 777
                                                                                                             666 777
                                                                                                               4s 5
                                                                                                                 ⌧Li
   Now, since
                                          2                  3
                                  p       666✓ˆ        ✓ 77777                                                  XN
                                            6
                                    N 66666                  777 = N
                                                               7
                                                                                  1/2
                                                                                      A      1
                                                                                                                     Si + o p (1)
                                            4⌧ˆ        ⌧5                                                       i=1
Under H0 : ⌧1 = . . . = ⌧L  1 = 0, the Wald test statistic is
                                                                                     p                                               p
                 W = (ˆ⌧    ⌧)> (V̂22 /N) 1 (ˆ⌧                      ⌧) =                 N(ˆ⌧                       ⌧)> (V̂22 /N) 1
                                                                                                                                       N(ˆ⌧   ⌧)
where
                                                2                          3
                                                666Â               Â     77
                                                  666 11               12 7 7777
                                   Â = 666                                    775
                                                    4Â             Â
                                                        21             22
                                                                                  16


                                                                   2                 3
                                                                   666V̂             7
                                                                     666 11 V̂12 77777
                                    V̂ = Â 1 B̂Â          1
                                                              = 666                  777 ,
                                                                       4V̂             5
                                                                           21 V̂22
                                                 X N
                                               1
                                    B̂ = N                 (S̃i,⌧ S̃>i,⌧ ),
                                                  i=1
                                                       2                      3
                                                       666A                   7
                                     1 p        1        666 11 A12 77777
                                 Â     !A         = 666                      777
                                                           4A                   5
                                                               21 A22
                                  p̂itl = ⇠ˆtl + xit ˆl + tit ↵ˆl + ˆ l hi ⇡ˆ + ⌧ˆ l gr        ˆ it
Then the Wald statistic is,
                                         X N                                      XN
                                 W = ( S i,⌧ )> A22 V̂221 A22 ( S i,⌧ )/N                                                (1.24)
                                          i=1                                     i=1
                                                                             p         p                  p
Under the null hypothesis that ⌧ = 0, ⇣ = 0, (ˆ⌧                        ⌧) ! 0,           N(✓ˆ     ✓) and   N(✓˜ ✓) converge in
                                p
distribution. Then LM        W ! 0, which implies that the tests are asymptotically equivalent (see
section 12.6.2 and section 12.6.3 in Wooldridge (2010)).
    This result is almost identical to the result from Lin and Wooldridge (2017), but the form of
the test statistic is considerably more complex. The multiple response setting leads to complicated
forms of the score and Hessian matrices.
1.6   Simulations
    I performed simulations to examine, when equation (1.9) does not hold, how the estimates
of the APEs di↵er from the true values. The main difficulty in carrying out simulations for the
estimator described in this chapter is how to approximate the integrals contained within the moment
conditions that are a part of each score function. There are two broad classes of methods that were
considered to approximate the integrals. Monte Carlo integration could be done, but the sample
size used to approximate the integral may have to be so large that computation time may be found
to be unacceptably high. Gaussian quadrature was also considered. The specific method used to
simulate the integrals was Gauss-Laguerre quadrature.
1.6.1    Data Generating Process
    For each simulation, N=1,000, L = 3, and T =4. I used 500 replications.
                                                                17


1.6.1.1     Regressors
      Within each replication, 1,000 observations at each time period t are generated of xit , zit , uit , and
vit , where
                                                        xit ⇠ Normal(0, 4)
                                                      zit ⇠ Uni f orm(0, 1)
                                                        uit ⇠ Normal(0, 1)
The variance of xit was chosen to be 4 in order to induce more variation in the data so that when
the minimization problem is performed, a local minima will not be chosen as the solution over the
                                                                                                               2
global minimum. vit is generated from either a N(0, 1), Logistic(0,1), or a                                    1 distribution.
      Furthermore,
                                             tit = 1[zit       z + xit  x  + uit         0]
                                                                   z 2 {0.1, 0.5, 1}
                                                                                   x  =0
and
                                                          rit1 = ⇣1 uit + vit
                                                          rit2 = ⇣2 uit + vit
where
                                                             ⇣1 2 {0.5, 1}
                                                             ⇣2 2 {0.1, 1}
1.6.1.2     The Fractional Responses
      The structural mean function G will be chosen to be the multinomial logit function such that,
                                                                       e⇠tl +xit l +tit ↵l l + l hi ⇡+⇣l uit
                        E[yitl |xit , tit , hi , zit , eit ] =       PL 1 ⇠ +xit +tit ↵ + hi ⇡+⇣ uit                           (1.25)
                                                                1 + k=1 e tk k k k k                         k
                                                                  18


                                                                                 1
                      E[yitL |xit , tit , hi , zit , uit ] =     PL    1 ⇠tk +xit k +tit ↵k
                                                                                                                 (1.26)
                                                             1+                             k + k hi ⇡+⇣k uit
                                                                    k=1 e
where L =3,    1 = 1,  2 = 2, ↵1 = 1, ↵2 = 2,                1 = 1,   2 = 2, ⇡ = (1, 1, 1), and ⇠tl = 0 for all t and l.
In order to generate the fractional response variables, the following procedure from Nam (2014) is
used,
    1. Calculate the response probabilities using (24), (25), and the aforementioned regressors.
    2. Draw 100 multinomial outcomes at each i and t among choices 1,2, and 3 based upon the
       response probabilities.
    3. Count the frequencies at each i and t and obtain the proportion for each outcome.
When vit ⇠ Normal(0, 1), then this is a special case in which equation (9) does not hold yet the
estimates of the APEs will be consistent. This is because the choice of G to be the multinomial
logit function and uit ⇠ Normal(0, 1) causes the QMLE problem to be set up as if the researcher is
working directly with the structural mean function. In this situation, the estimates of the parameters
are the estimates of the structural parameters. In this case, the simulations will serve as a check
upon the consistency of the estimates.
1.6.2   Simulation Results
    Each replication examines the data at the 25th, 50th, 75th, and 90th percentile of the data,
whereby the first time period will be used to examine the APEs at the 25th percentile, the second
time period will be used for the 50th percentile, and so on. The tables include the results from
the 25th, 50th, and the 75th percentiles. Although there are no time varying intercepts in these
simulations, presumably a researcher would want to know the APEs at each time period in order
to account for the time-varying intercepts, and they might want to see how the percentile APEs
change given these intercepts.
    The true APEs are constructed from using within each replication the data at the aforementioned
percentiles and the true values of the parameters. Then, across these replications, the mean value
is obtained. The standard errors of the estimates are obtained by calculating the square root of the
                                                               19


sample variance across the 500 replications. All simulation tables can be found in the appendix.
Table B.1 displays the result that at the lower percentiles and when vit ⇠ Normal(0, 1), the point
estimates sometimes underestimate the true APEs; however, at the 50th and 75th percentiles, the
point estimates are accurate and precise. While the standard errors are somewhat large on the partial
e↵ect corresponding to the binary EEV, they are not so large as to be greater than the estimates.
Furthermore, all of the APEs have the correct sign. Di↵erences in the estimates are minor when
adjusting the values of   z , ⇣1 , and ⇣2 .
    Now, consider when vit ⇠ Logistic(0, 1). In this situation, equation (9) does not hold, yet we
have a random variable with a probability density function that has heavier tails than the standard
normal pdf, and logistic random variables have been used to approximate normal random variables.
The results from this simulation are in Table B.2. The results are similar to the normal case, though
the Monte Carlo standard deviations are noticeably larger. Once again, note that the estimates in
the estimates are minor as the parameter values noted in the tables change.
                   2
    When vit ⇠     1, then the model misspecification is substantial. The distribution of vit cannot
be used to approximate the normal distribution, nor is the distribution symmetric. The results of
this simulation are in Table B.3. In some sense, this is an improvement over the logistic case. The
standard errors are often smaller on the APEs. It is important to note that the estimates do not give
the incorrect sign at any of the percentiles. This is in contrast to Nam (2014), in which she points
out that when a linear control function approach is taken, the estimates of the APEs can take the
wrong sign.
    The estimates of the APEs taken with respect to the joint expectation of ci and eit are also
provided. In this case, vit ⇠ N(0, 1). The results are provided in Table B.4. The estimated APE of
t1 at the 25th percentile gives a poor estimate of the estimate true APE. It is suspicious that the
estimate equals its standard error. There is no reason to think that the estimator would perform
poorly at the 25th percentile, especially in light of the full results presented above. It is worth
noting that the distribution of uit used to construct these estimates was not made while simulating
the correct distribution of uit |xit , zit , tit . This is done for two reasons. First, the researcher would
                                                           20


not know the distribution of uitr |hi . Second, this would add another layer of complexity to the
simulation, and it would only change the values of the percentile mean and estimates of the average
partial e↵ects. The results would still demonstrate that the estimator is consistent.
1.7    Application
    In order to demonstrate the practical usefulness estimator, I used data from the NLSY97 project.
The project, undertaken by the U.S. Bureau of Labor Statistics, gathered information on individ-
uals born between 1980 and 1984. The survey data available is based upon eighteen rounds of
questioning, from 1997 to 2018. Respondents were asked questions in the areas of employment,
education, geography and household information, parental and childhood information, dating and
marriage, health, income and assets, attitudes, expectations, activities, crime, and substance use.
    I constructed this dataset by using a balanced panel of respondents from the years 2007, 2009,
2010, and 2011. These were years for which there was data on the question of how many hours on
average each night each respondent slept. I multiplied by the number of days each year and divided
by the total number of hours in that year in order to obtain the fraction of time devoted to sleep in
that year. Respondent were also asked each week of each year how many hours they had worked
that week. I then multiplied by the number of weeks in that year and divided by the total number
of hours in the year in order to obtain the fraction of time in that year devoted to work. The sum of
these two fractions are then subtracted from 1, which gives the fraction of time in that year devoted
to leisure.
    The binary endogenous variable is the marital status of the respondent. In the survey, marital
status has six categories. These are whether the respondent is neither married nor cohabitating,
not married but cohabitating, married and cohabitating, legally separated, divorced, and widowed.
This is collapsed into a binary variable indicating whether the respondent is legally married. The
respondent would be counted as married if they are married and cohabitating or legally separated.
The marital status of each respondent is updated each month throughout the survey. Since the unit
of time for the panel is a year, the martial status is recorded to be the status at the end of the year.
    The instrument that I chose for marital status gives insight into working with fractional time
                                                   21


variables in the context of time-use data. I created a variable for the number of times that a re-
spondent engaged in sexual intercourse within each year. This was done by using data across three
survey questions for each year. In one question, respondents are asked if they engaged in sexual
intercourse. Respondent are then asked how often they engaged in intercourse. If they respond that
they do not know, respondents are then asked to give an estimate of the amount of time that they
engaged in intercourse. I combined the answers for each of these survey questions into a single
variable which equaled the number of times that a respondent engaged in sexual intercourse. The
results from the first-stage probit are given in Table C.1. Additional covariates include the level of
education of each respondent, whether they live in an urban area, the number of biological chil-
dren at their residence, and their household income in thousands of US Dollars. Health controls
are added such as the number of times that a respondent was treated by a doctor or a nurse during
the year, the number of times that a respondent was sick but did not seek treatment from a doctor
or a nurse, whether their health a↵ected the amount of work that they engaged in, and if they have
a chronic condition. The z-score is approximately 6.00 on the instrument for the entire sample, and
approximately 4 for the male and female subsamples. Table C.2 gives the estimates from assuming
a linear probability model for the binary marriage variable. This model was not used to obtain esti-
mates of in order to estimate the APEs, but instead to display the strength of the instrument. The
conventional wisdom, based upon Staiger and Stock (1997), is that a sufficiently strong instrument
yields a first stage F-statistic of at least 10. In the full sample the F-statistics of marriage is approx-
imately 36, while in each subsample the F-statistic is approximately 16. The reasons for this are
partly due to the issue of finding a new partner. Levinger and Moles (1979) notes that cohabitating
couples consider the frequency of sex as well as the ease of finding other partners before dissolving
the relationship. Before marriage, the costs of finding a new partner and dissolving the relation-
ship are low. As Oppenheimer (1988) notes, there are considerable search costs to finding a new
partner in order to replace lost sexual activity from a relationship. In the absence of children and
the legal barriers that must be overcome to end a marriage, these search costs should be lower for
unmarried couples. Couples then enter into marriage in order to secure the benefits of the current
                                                      22


relationship and facilitate continued emotional investment into their relationships (see Yabiku and
Gager (2009)).
    For married couples, in addition to higher costs of ending a marriage, the issue of infertility
can play a role in ending the relationship. Sexual intercourse can act as the channel through which
infertility acts.As Andrews, Abbey, and Halman (1991) note, infertility can lead to lower sexual
self-esteem, which in turn leads to a lower frequency of sexual intercourse among married couples.
Furthermore, just as for unmarried couples, the frequency of sexual intercourse plays a role in de-
termining whether to end the relationship. In fact, a past national survey revealed that it was the
second greatest issue of concern for young married couples (see Yabiku and Gager (2009)). The
count of the number of times that each respondent had sex in each year also satisfies the exclud-
ability condition, and the reasons why are as follows. First, the information given by the variable
itself does not communicate the number of hours or fraction of time in a year that a respondent
spent engaging in intercourse. Second, even if this information was known, there is nothing to
suggest that the time used on intercourse would not still be counted amongst the broadly defined
leisure fraction. In other words, if respondents engaged in less intercourse, there is no reason to
think that they would not devote that time instead to other activities that are neither work nor sleep,
particularly for the population under study over the time period under study given the controls. Liu
(2000) provides some evidence for this among married couples, in the sense that declines in sexual
intercourse amongst this group is due to substitutions away from intercourse towards other goods,
services, and activities. Contrast this variable with the marriage variable or the number of children
at the respondent’s residence. These variables change how much an individual allocates of their
time to labor, leisure, and work, as opposed to merely the activities within those categories. There
seems to be a specific relationship between the frequency of sexual intercourse and the raising of
young children. Based upon a panel of German households, Schröder and Schmiedeberg (2015)
note that sexual frequency declines until a child reaches approximately six years old, and then
sexual frequency tends to increase. The inclusion of the number of biological children in the re-
spondent’s household is meant to control for some relationship between sexual intercourse and the
                                                  23


raising for children. The health variables are included in order to control for possible correlations
between sex and health outcomes, which in turn could have an e↵ect on sleeping patterns or hours
worked in a year. After all, the frequency of sexual intercourse is correlated with the health of in-
dividuals (see, for example,Walfisch, Maoz, and Antonovsky (1984)). It is important to control for
these specific e↵ects, since these e↵ects are noted in health and marriage literature to be correlated
with sex and would seem to have an e↵ect on how individuals allocate their time.
    The validity of the instrument might be a↵ected if too many categories of activities are con-
cluded. As more activities are separated out from the leisure share into additional fractional re-
sponse variables, then it becomes more likely that the instrument will cause a shift in the additional
shares. Identification relies in part on a coarse outside option, so care needs to be taken with respect
to determining which APEs should be considered.
    Tables C.3 and C.4 display the APE estimates of marriage over the distribution of the unob-
served heterogeneity across respondents. The APE at the 25th percentile evaluates the APE at the
25th percentile of the data in the first time period, the APE at the 50th percentile evaluates the APE
at the 50th percentile, and so on, as was done in the simulations. The standard errors are generated
using 50 bootstrap replications with a sample size of 100. I chose the value of R to be 300. The
intercepts are not time varying, though they do vary with each fractional response. Within each
table I have included estimates using only the male respondents and female respondents, in order
to determine the e↵ect of marriage upon the use of time of men and women separately. Across
the entire sample and the subsamples, the APEs are significant for marriage at ↵= .05, though the
e↵ect upon sleep is not significant at every percentile level. These e↵ects di↵er depending upon
the subsample. Using the entire sample, it would seem that marriage is not associated with a sig-
nificant e↵ect upon the fraction of time devoted to sleep, but it is associated with a significant
negative e↵ect at each percentile across all time periods in the fraction of time devoted to work.
For the subsample of men, the APE’s are significant and negative for sleep, but they are positive
and significant for work. For women, marriage suggests declines in the fraction of time devoted
to both work and sleep, which leads to an increase in what is broadly considered by myself to
                                                    24


be leisure, though activities which might have more time devoted to their completion may not be
considered "leisurely."The estimated average partial e↵ects of Theorems 1 and 2 are not included
in this chapter, though they are included in the code accompanying this chapter.
    Table C.3 displays the results from applying the estimator of the parameters that I have in-
troduced and the technique of obtaining the APEs that is necessary when a binary covariate is
endogenous. Table C.4 displays the results from applying the Mundlack-Chamberlain device, but
the additional source of endogeneity has not been integrated out. Both the magnitudes and signs of
the average partial e↵ects di↵er across the tables. The estimated e↵ect of marriage on the fraction
of time devoted to work and the fraction of time devoted to sleep both seem negligible. This state-
ment also applies to the estimates that are based upon the subsample of men. The e↵ects are larger
for women, though it would seem that marriage does not lead to declines in fraction of time de-
voted to sleep. This suggests that there is a significant source of individual endogeneity that arises
from the individual and time specific errors.
1.8    Conclusion
                                                                                                                   p
    For the number of observations N and the number of draws R ,I have provided a                                    N consistent
estimator of the APEs for the multiple fractional response setting while allowing for panel data,
unobserved heterogeneity, and a binary EEV. An advantage of this approach is that the constraint
                             PL
upon the conditional mean l=1     E(yitl |hi , xit , zit , tit , uit ) = 1 is satisfied, which will not always hold
for every choice of the conditional mean function. For example, a probit conditional mean specifi-
                                                                     PL
cation is not guaranteed to satisfy the constraint that l=1               E(yitl |hi , xit , zit , tit , uit ) = 1,and to assume
a linear model is not any better. At best, a linear model could allow for a system regression to
provide linear approximations of the average partial e↵ects, and the estimator would be expensive
if di↵erences are taken to eliminate the unobserved heterogeneity. At worst, the APE’s could be
such poor estimates that they not only fail to reflect the relationship between the dependent vari-
able and the relevant covariate, but they fail to appropriately consider the relationship amongst the
dependent variables.
    If the multinomial logit conditional mean specification (or any specification that satisfies the
                                                           25


aforementioned constraint) is chosen, then there are still some choices estimators. Additional mo-
ment conditions could be added to the GMM estimator in order to increase efficiency. A working
correlation matrix could be used to obtain a GEE estimator. Both estimators would bring efficiency
gains over the estimator I have proposed, but they would be more computationally burdensome.
Even in a setting where L = 3 and T = 2, such estimators would increase efficiency but may
significantly increase computation time.
    A separate issue is whether to rely upon this method or some QMLE estimator derived from
some multinomial likelihood problem while not integrating out the unobserved heterogeneity. A
failure to integrate out a source of endogeneity that is persistent across subjects and time periods
can lead to insignificant average partial e↵ects. In order to determine whether the full method and
estimator in this chapter are necessary, the variable addition test for endogeneity should be applied.
                                                  26


                                             CHAPTER 2
           DOUBLY-ROBUST QUANTILE TREATMENT EFFECT ESTIMATION
2.1    Introduction
    There has been a recent reevaluation of the e↵ectiveness of di↵erence-in-di↵erence estimators.
Estimates of the average treatment e↵ect on the treated (ATT) can be sensitive to nuisance functions
that are based upon the probability of treatment given covariates, known as the propensity score,
or the conditional mean of the di↵erence in untreated outcomes before and after treatment for
the untreated subpopulation. Identification of the ATT relies upon the parallel trends assumption
and the overlap assumption. A relaxation of these assumptions involves conditioning on covariates
when that may change either the expected treatment status or the change in untreated outcomes.
These separate approaches can be combined to obtain a doubly-robust estimator of the ATT.
    If we are willing to strengthen these assumptions, then we can go further than estimation of
the ATT. The QTT can be estimated at any desired quantile. How useful this is depends upon
the topic under study. If a researcher is concerned with the median outcome of a variable post-
intervention, or outcomes at the tail of a variable’s distribution, then an estimator of the QTT
would be desired. Observing which quantiles experience a significant treatment e↵ect could then
alter policy responses. The assumptions that are necessary for identification go beyond the parallel
trends and overlap assumptions. The parallel trends assumption is strengthened from an assumption
on conditional mean independence to an assumption of independence conditioned on covariates.
More assumptions are required on the outcome variables across time.
    In this chapter, I demonstrate that the strength of the assumptions from Callaway and Li (2019)
purchase more than the authors may have realized. Using their assumptions, it is possible to gener-
ate a doubly-robust estimator of the QTT. This estimator then allows for a relaxation of the assump-
tions placed upon the nuisance functions. In this case, the nuisance functions are a propensity score
and a conditional cdf of the di↵erence in untreated outcomes from before and after treatment. The
double-robustness property allows for a reduced rate of convergence of the both functions. If the
nuisance functions are estimated nonparametrically, and depending upon the type of nonparamet-
                                                   27


ric estimator that is used, the double-robustness result has another beneficially property. Subject to
minimal smoothness conditions on the nuisance functions, the rate of convergence itself may not
matter
    First, in this chapter I provide the identification result that guarantees double-robustness. This
result is an extension of the key identification result found in Callaway and Li (2019). The propen-
sity score and the relevant conditional cdf can be misspecified, but not simultaneously. The prop-
erties of this estimator are studied. The properties are broken up into subcases, considering if the
nuisance parameters are estimated nonparametrically or otherwise.
    As a prerequisite to studying the properties of the QTT estimator, I derive the efficient influence
function of the doubly-robust portion of my estimator. The portion that I am referring to is the cdf
of the di↵erence in untreated outcomes from the pre and post-treatment periods for the treated
subpopulation at an arbitrary real number. It is upon the estimation of this parameter that the
double-robustness result is applied. The efficient influence function is considered in the panel data
setting.
    With the efficient influence function in hand, I then demonstrate that the doubly-robust estima-
tor achieves the semiparametric efficiency bound in the panel data setting. This is shown through
two separate cases, where in the first case the nuisance functions are estimated parametrically. In
the second case they are estimated nonparametrically, with the propensity estimated using the sieve
logit estimator and the conditional cdf estimated using a kernel estimator. Both estimators could
be chosen to be sieves or kernel estimators, and this would only change the restrictions on the
smoothness of the nuisance functions.
    Then, I demonstrate through simulations that without the double-robustness correction to the
Callaway and Li (2019) estimator the root mean square error is very large, owing to the large
standard error of the estimator. My doubly-robust estimator has root mean square errors that are
less than one-third of the Callaway and Li (2019) estimator, even if the nuisance functions are
misspecified.
    I also show that when the empirical bootstrap is used for inference, the double-robustness prop-
                                                    28


erty which allows for the weakening of the assumption that the nuisance functions converge to the
                            1/4
truth at the rate of o p (n     ) no longer holds. For this reason, I maintain that the nuisance func-
                                              1/4
tions should converge at the rate of o p (n       ). I investigate when this assumption upon the rate of
convergence holds for the relevant nuisance functions.
    Finally, I apply my estimator to county-level unemployment data in order to compare it to the
Callaway and Li (2019) estimator. I use the same dataset that is applied in Callaway and Li (2019).
I then compare point estimates and confidence bounds between my estimator and the Callaway and
Li (2019) estimator.
2.1.1   Literature Review
    This chapter draws directly from Callaway and Li (2019) in order to establish the identification
result and the form of the QTT estimator, but the idea of a doubly-robust estimator in this con-
text was inspired by other papers in the treatment e↵ects and missing data literature. Sant’Anna
and Zhao (2020) examines the properties of the doubly-robust ATT estimator, and Callaway and
Sant’Anna (2021) extends these results to the staggered treatment setting while also examining
the asymptotic properties of various weighted ATT estimators. The doubly-robust ATT estimator
has existed in the econometrics literature as an example of a doubly-robust estimator in Rothe and
Firpo (2013), along with most of its properties in the single treatment period, panel data setting.
The doubly-robust estimator combines the regression approach of Heckman, Ichimura, and Todd
(1998) and Heckman et al. (1998), along with the propensity score matching approach of Abadie
(2005), which itself is based upon Horvitz and Thompson (1952). My estimator is similar in that
combines two approaches that are analogous to separate regression and propensity score approach-
esAll of these approaches take place within the di↵erence-in-di↵erence framework popularized by
Card (1990) and Card and Krueger (1994).
    The literature on quantile treatment e↵ects, when considering either selection on observables
or a panel data setting, prominently includes Firpo (2007), Athey and Imbens (2006), Bonhomme
and Sauder (2011), and Chernozhukov et al. (2013). These approaches do not consider double-
robustness, and their identification assumptions are stronger than the identification assumptions
                                                       29


of this chapter. For example, Firpo (2007) sets up an M-estimation problem with weights based
upon propensity score matching, but identification depends upon a strong ignorability. assumption.
The nonparametric logit sieve estimator of Hirano, Imbens, and Ridder (2003) is applied, but the
conditions needed for asymptotic normality are considerably restrictive. A result similar to Firpo
(2007) in the missing data setting is found in Wooldridge (2007).
    The double-robustness properties of my estimator are based upon more than the aforemen-
tioned doubly-robust estimators of the ATT. A general weighting result for treatment e↵ects is
presented in Słoczyński and Wooldridge (2018), and the basis for constructing doubly-robust mo-
ments conditions is outlined in Chernozhukov et al. (2016). The latter work is closely related to
the doubly-robust estimator in the missing data setting of Muris (2020). The construction of my
double-robustness estimator is partly based upon the estimators in Sued, Valdora, and Yohai (2020)
, though they consider a missing data setting and do not discuss the issue of inference. Rothe and
Firpo (2019) and Rothe and Firpo (2013) consider the asymptotic properties of double-robust esti-
mators when the nuisance functions are estimated using kernel density methods. The work of Fan
et al. (2016) is particularly important for this chapter, since it considers a doubly-robust estima-
tors with nuisance parameters that are estimated via sieves. I also make one final point about a
doubly-robust QTT estimator that exists in the literature. Caracciolo and Furno (2017) proposes an
estimator of the quantile treatment e↵ect (QTE) that involves taking a quantile of a random variable
that is a function of the propensity score and fitted values; however, this estimator only identifies
the quantile of interest in a very restrictive case, using unstated assumptions. Their approach partly
builds o↵ of Machado and Mata (2005), but this approach involves obtaining unconditional quan-
tiles directly through a random sample over conditional quantiles.
2.1.2    Structure of the Chapter
    The chapter is structured as follows. Section 2.2 lays presents the framework, assumptions and
identification result. Section 2.3 presents the estimator and considers estimation of the nuisance pa-
rameters. Section 2.4 considers the large-sample properties of the estimator. Section 2.5 examines
the validity of the empirical bootstrap when applying this estimator. Section 2.6 contains a Monte
                                                    30


Carlo study that examines the small sample properties of my estimator at various quantiles under
misspecification of the nuisance functions and in comparison to another esimator. Section 2.7 in-
cludes the application of my estimator to estimating quantile treatment e↵ects on the treated using
unemployment data. Section 2.8 concludes the chapter. The mathematical proofs are contained in
the appendix, as well as figures and tables.
2.2   Assumptions and Setting
2.2.1    Setting
    The setting that I am considering is the panel data setting. As in Callaway and Li (2019), I
assume that the data consists of at least three periods, with treatment period t and pre-treatment
periods t 1 and t 2. No unit receives the binary treatment before time t. D = 1 for unit i if treated
at time t. D = 0 if an individual is never treated. The outcomes Yt , Yt 1 , and Yt    2 are observed,
along with covariates X.
    Each unit i has the potential outcomes Y0t and Y1t , but these outcomes are random variables
whose realizations from underlying populations cannot be observed simultaneously for each unit
i. The observed outcome Yt is then expressed as
                                       Yt = DY1t + (1       D)Y0t
    Let q⌧ denote the ⌧-quantile for some random variable Z, where
                                  q⌧ = FZ 1 (⌧) B in f {z : FZ (z)     ⌧}
and FZ (z) is the cumulative distribution function (cdf) of Z. FY1t|D=1 and FY0t|D=1 denote the cdfs
of the treated and untreated outcomes for the treated subpopulation, respectively. The QTT is then
defined as,
                                 QT T (⌧) = FY1t1|D=1 (⌧)    FY0t1|D=1 (⌧)
Interest in the QTT stems from the ability to identify the e↵ect of an intervention on a treated group,
compared to the counterfactual outcome. For example, suppose that a portion of a population
                                                  31


receives a Covid-19 vaccine, with the outcome variable being gross income. While we may wish
to know the treatment e↵ect at the median (⌧ = 0.5) for the entire population, the identification
assumptions may too strong to identify such a parameter 1. With weaker assumptions, we can
identify QTT(0.5). That is to say, we can identify the median e↵ect of Covid-19 vaccination on
gross income for the treated subpopulation.
2.2.2     Identification Assumptions
    These assumptions are directly from Callaway and Li (2019). When presenting these assump-
tions, I will note how they are comparatively mild when compared to other assumptions in the
econometrics literature.
    Now, let Yt = Yt          Yt 1 . Then, consider the following assumptions.
Assumption ID.1. The observed data {Yit , Yit 1 , Yit 2 , Xi , Di }ni=1 are independent and identically dis-
tributed draws from the joint distribution FYt ,Yt       1 ,Yt 2 ,X,D . In addition, Yit = Di Y1it +(1 Di )Y0it , Yit  1 =
Y0it 1 , andYit  2 = Y0it 2 .
Assumption ID.2. Each of the random variables Yt for the treated group and Yt 1 , Yt 1 , Yt                          2 for
the treated group are continuously distributed on their support with densities that are uniformly
bounded from above and bounded away from 0.
    Assumption ID.2 ensures the uniqueness of the copula by restricting the outcomes to be contin-
uous. Assumption ID.1 restricts the setting to the panel data setting with a single treatment period.
If the copula is not unique, then even with the Copula Stability Assumption we may not be able to
identify the cdf of Y0t |D = 1.
    The next assumption is the final assumption that I use for identification. Let the support of X
be denoted by X.
Assumption ID.3. p B P(D = 1) > 0 and, for all x 2 X, p(x) B P(D = 1|X = x) < 1.
    The first part of the assumption ensures that there is some positive probability of treatment. The
second part is the "overlap" assumption that is common in the di↵erence-in-di↵erence literature.
    1
      In the case of the average treatment e↵ect vs the average treatment e↵ect on the treated, see Wooldridge (2010)
                                                            32


This guarantees that for any value of X in X there is a positive probability of that value appearing
in both the control and treatment groups. Without this assumption, the QTT cannot be identified
since F   Y0t |D=1 (y) could not be identified for a population that contains that contains those values
of X for which the assumption is violated. Note that the overlap assumption as stated here is not
enough when the estimation of the propensity score is nonparametric. In that case, the propensity
score requires sharp upper and lower bounds away from 0 and 1.
    The next assumption is directly exploited to obtain double-robustness of the estimator.
Assumption ID.4.          Y0t ?? D|X
    This assumption takes the parallel trends assumption of the di↵erence-in-di↵erence literature,
E[ Y0t |D = 1, X] = E[ Y0t |D = 0, X] and strengthens it over the entire distribution. The assump-
tion states that the distribution of the untreated outcomes is una↵ected by the treatment e↵ect,
conditional on the covariates. This assumption is necessary to obtain the estimator of F                      Y0t |D=1 (y).
This assumption, as I will show, also makes the doubly-robust estimator of F               Y0t |D=1 (y)  the most ef-
ficient estimator in the panel data setting. Furthermore, as strong as this assumption is, it is weaker
than the Strong Ignorability assumption of Rosenbaum and Rubin (1983), where Y0t , Y1t ?? D|X,
that is applied in Firpo (2007).
    The last assumption is known as the "Copula Stability Assumption"
Assumption ID.5 (Copula Stability Assumption). C           Y0t ,Y0t 1 |D=1,X (·, ·) =C Y0t 1 ,Y0t 2 |D=1,X (·, ·)
    This assumption is the most controversial assumption used for identification. As explained in
Callaway and Li (2019), this assumption requires both the panel data setting and data over three
time periods. This assumption is not placing any restrictions on any of the marginal distributions
of the variables involved. Instead, by placing restrictions on the copula we are restricting the joint
distribution of the variables in question based upon their joint distribution in prior periods. We
cannot observe the untreated outcome for the treated subpopulation, but we can jointly observe
the outcomes in the periods before treatment for the treated subpopulation. By an application of
                                                      33


Sklar’s Theorem and by writing Y0t |D = 1 as Y0t                  Y0t  1 + Y0t 1 |D = 1, the cdf of Y0t |D = 1 can be
identified using the joint distribution of Y0t 1 |D = 1 and Y0t 2 |D = 1.
    This assumption is similar to, and perhaps even weaker than, the assumption of stationarity in
the time-series setting. No claim is being made that a sequence of random variables has a constant
joint distribution over some shift in time. All that is being claimed is that a feature of the joint
distribution, the joint dependence between the random variables, is fixed over a limited time period.
A form of this assumption has been applied in the measurement error literature. In Cameron et al.
(2004), the copula is used to model the di↵erence in count variables, where each count variable
represents di↵erent measurements of the same outcome.
2.2.3    Identification
    With the identification assumptions in hand, I now present the identification result. In the fol-
lowing notation, ⇡(X) denotes the choice of a propensity score, while p(X) denotes the true propen-
sity score. An example of ⇡(X) might be the standard normal cumulative distribution function, but
p(X) has a logit functional form. Similarly, P( Y0t  y|X) is the true conditional cdf of Y0t , but
P̃( Y0t  y|X) denotes a choice of a conditional cdf. For example, P̃( Y0t  y|X) might be chosen
to be a conditional Logistic (0,1) cdf, but P( Y0t  y|X) is the standard normal cdf.
Theorem 3. Under assumptions ID.1-ID.5, and assuming that ⇡(X) = p(X)or P̃( Y0t  y|X) =
P( Y0t  y|X),
                 FY0t |D=1 (y)
                        h                                                                                    i
                               1
                 = E {F        Y0t |D=1 (F Yt 1 |D=1 ( Yt 1 ))  y    FYt1 1 |D=1 (FYt 2 |D=1
                                                                                              (Yt 2 ))}|D = 1
where,
                           "                      !                                              !                #
                             1      D ⇡(X)                           1    D ⇡(X)               D
    F  Y0t |D=1 (y) =E                                { Yt  y}                                    P̃( Y0t  y|X)   (2.1)
                                p 1 ⇡(X)                                p 1 ⇡(X)               p
if ⇡(X) = p(X) a.c., or P̃( Y0t  y|X) = P( Y0t  y|X) a.c. Then,
                                                                   1                   1
                                          QT T (⌧) = FY1t|D=1 (⌧)        FY0t|D=1 (⌧)
                                                               34


     The proof of the first part of this result is provided in Callaway and Li (2019). For the sake
of making this chapter as self-contained as possible, I will outline their argument. First, note that
since E[       Y0t|D=1 y ] = E[   Y0t|D=1 Y0t 1|D=1 +Y0t 1|D=1 y
                                                                   ] = E[     Y0t|D=1 +Y0t 1|D=1
                                                                                                 ]. Since this expectation is over
the joint distribution of the random variables Y0t|D=1 and Y0t                           1|D=1 ,   and since this joint distribution
can be written in terms of the copula and the marginal distributions of Y0t                              1 and Y0t 2 by assumption
ID.1 and Sklar’s Theorem. The result then follows from a change of variables.
     The second part of the theorem is the basis for the double-robustness property of the estima-
tor. Either the propensity score or the conditional cdf of Y0t needs to be correctly specified so
F   Y0t |D=1 (  ) will be correctly identified. F             Y0t |D=1 ( ) is needed to identify FY0t |D=1 (y). The intuition
behind the double-robustness result is that if the propensity score is correctly specified, then the
information provided by conditional cdf of Y0t becomes redundant, at least for identification. If
P̃( Y0t  y|X) is correctly specified, then the weight that is applied to this conditional cdf filters
out the incorrect information that is left over from misspecification of the propensity score.2
2.3      Estimation
     This section will present the estimators that can be used to obtain the QTT under the iden-
tification assumptions. There are two di↵erent estimators that I present, with an asymptotically
negligible di↵erence. These estimators di↵er in that they calculate the weights di↵erently for esti-
mation of F         Y0t |D=1 (y). The first estimator is,
                                            QTˆ T (⌧) = F̂Y1t|D=1 (⌧)      1
                                                                                F̂Y0t|D=1 (⌧)    1
where
                                            F̂Y1t1|D=1 (⌧) = in f {y : F̂Y1t |D=1 (y)         ⌧}
                                            F̂Y0t1|D=1 (⌧) = in f {y : F̂Y0t |D=1 (y)         ⌧}
                    F̂Y0t |D=1 (y)
     2
       Unless otherwise noted, from this point onwards the nuisance functions will be assumed to be correctly specified
from this point onward. So, ⇡(x) = p(x)and P̃(Y  y|x) = P(Y  y|x).
                                                                     35


                                        X
                          = nD1                [ {F̂     1
                                                         Y0t |D=1 ( F̂ Yt 1 |D=1 ( Yit 1 ))  y             F̂Yt1 1 |D=1 (F̂Yt 2 |D=1
                                                                                                                                      (Yit 2 ))}]
                                         i2D
where nD denotes the number of treated observations and
   F̂  Y0t |D=1 (y)
                n 6B
                    20                                           1                         0                                                  1                         3
              X     666BBB 1 Di                    ⇡ˆ (xi ) CCCC
                                                                 C                         BBB                                                CCC                       777
            1         66BB                                                                   BBB 1 Di ⇡ˆ (xi )                    Di CC
                                                                                                                                                CCC P̂( Y0t  y|xi )77777
    =n                 664BB@ n   P                              CC { Yit  y}                 BB@ Pn                        Pn
                                     k=1 Dk 1          ⇡ˆ (xi ) CA                                  k=1 Dk 1    ⇡ˆ (xi )        k=1 Dk A                                  5
              i=1                      n                                                              n                            n
An alternate estimator of F                             Y0t |D=1 (y)  is,
      F̂   Y0t |D=1 (y)
                   n 6
                             2                                                       0                                                 1                         3
                  X          666 1 Di ⇡ˆ (xi ) !                                     BBB
                                                                                       BBB 1 Di ⇡ˆ (xi )                       Di CC
                                                                                                                                       CCC                       777
       =n       1              66
                                664                                  { Yit  y}          BB@                              Pn             CCC P̂( Y0t  y|xi )77777
                                        l0       1     ⇡ˆ (xi )                                     l0 1 ⇡ˆ (xi )            k=1 Dk A                              5
                  i=1                                                                                                          n
                          1 Pn            ⇡(xi )(1 Di )
where l0 = n                        i=1 1 ⇡(xi ) .        The interesting point here is the estimation of the nuisance functions.
If the nuisance functions are estimated parametrically, then by standard assumptions in the ap-
pendix the nuisance functions will converge rapidly enough in probability to guarantee asymptotic
normality, since the parameters that index the functions will converge at a sufficiently fast rate. The
issue here is that it is unlikely for the nuisance functions to be correctly specified.
    If the nuisance functions are estimated nonparametrically, then estimation depends upon exactly
how they are estimated. For example, suppose that both nuisance functions are estimated using
kernel-based methods. The advantage of this, as seen in Rothe and Firpo (2013), is that the kernel
estimation permits the estimator to be decomposed into a bias term, a first-order stochastic term,
and a second-order stochastic term. Depending upon how the bandwidth is chosen, the estimator
can converge in probability at a fast enough rate to ensure asymptotic normality, but at a slower
rate than would otherwise be necessary due to the double-robustness property.
    The double-robustness property here does not only mean that the identifying moment is doubly-
robust. An implication of this is that the higher-order derivatives are also doubly-robust. This im-
plies that like the identifying moment, the derivatives also equal 0 if at least one of the nuisance
parameters is correctly specified. This is useful for both sieve and kernel estimation. In the case
of kernel estimation, this implies that the bias term in the kernel decomposition equals 0. In the
                                                                                    36


case of sieve estimation, the usefulness of this is that terms in the asymptotic expansion of the
doubly-robust estimator can then be bounded by the bracket integral with respect to the L2 norm
over the function space.
2.4    Asymptotic Properties
    The key asymptotic properties of the estimator revolve around the asymptotic behavior of the
estimator of F       Y0t |D=1 (y), in addition to the asymptotic behavior of ⇡(x) and P̂( Y0t  y|X). The
limiting behavior of the QTT estimator is unchanged by the doubly-robust estimator. Before the
asymptotic behavior of the estimator can be discussed, the following assumption will be intro-
duced:
Assumption NP.1.
                                                                                     1/4
                                                       sup|ˆ⇡(x)   ⇡(x)| = o p (n        )
                                                       x2X
                                                                                     1/4
                                   sup||P̂( Y0t  y|x)  P( Y0t  y|x)|| = o p (n         )
                                   x2X
    This is the conventional uniform convergence assumption in the literature on nonparametric
rates of convergence. It is not necessary when the estimator is doubly robust, but it is sufficient. 3.
Parametric assumptions that imply Assumption NP.1 can be found in Appendix A. Nonparamet-
ric estimators for each of the nuisance functions, as well as the assumptions necessary to imply
Assumption NP.1, can also be found in Appendix A.
    Before I establish the asymptotic properties of the estimator, the efficient influence function
needs to be found. Besides claiming that the estimator of F            Y0t |D=1 (y) is doubly-robust, the efficient
influence function will allow us to determine whether the estimator is the most efficient estimator
of F   Y0t |D=1 (y). Note that this assumption of efficiency is only being made under the identification
assumptions. If these assumptions do not hold, then the efficiency result fails. The efficient influ-
ence function should also be the identifying moment condition to estimate F                     Y0t |D=1 (y) at a fixed
    3
      This assumption is mentioned in a footnote of Callaway and Sant’Anna (2021) when the nuisance parameters are
estimated nonparametrically, even though their estimator is doubly-robust. It is not necessary.
                                                         37


value y. When the nuisance functions are nonparametrically estimated, asymptotic normality will
depend upon a Taylor expansion of the efficient influence functions.
     The efficient influence functions is presented in the following theorem:
Theorem 4. Under assumptions ID.1-ID.4, the efficient influence function is,
                                       (1 D)p(X)                    (1 D)p(X)
                          F⌧ (W) =                      { Yt  }                    P( Y0t  |X)
                                        p(1 p(X))                   p(1 p(X))
                                       D                        D
                                     + P( Y0t  |X)                F Y0t |D=1 ( )
                                        p                        p
                                                                                         E[D]
     Note that if we were to take the expected value of this function,                     p
                                                                                                = 1, so in expectation the
efficient influence function reduces to the identifying moment condition of F                            Y0t |D=1 (y) induced by
F̂  Y0t |D=1 (y).  With the efficient influence function in hand, we can proceed to describe the asymp-
totic behavior of F̂      Y0t |D=1 (y).  Before that takes place, it should be noted how exactly each of the
nuisance functions are estimated.
     With the efficient influence function, I can now proceed to the first major distributional result. I
will now proceed proving the consistency and asymptotic normality of F̂                           Y0t |D=1 (y).
                                                                                                                   p
Theorem 5. Suppose that assumptions ID.1-ID.5 and NP.1 hold. Then F̂                               Y0t |D=1 (y)  !F     Y0t |D=1 (y),
and
p ⇣                                     ⌘ d
   n F̂    Y0t |D=1 ( ) F  Y0t |D=1 (y)   ! N(0, E[ (D, p(X), P( Y0t  y|X); p)]2 ), where
                              (D, p(X), P( Y0t  y|X); p)
                                
                            = w0 (Di , Xi ; ⇤ )( { Yti  y}          (w0 (Di , Xi ;  ⇤
                                                                                       )
                                              ⇤                       ⇤
                                w1 (Di , Xi ;   ))P( Y0ti  y|Xi ;       )   w1 (Di )F   Y0t |D=1 (y)
and
                                                                                       !
                                                      ⇤       1     D ⇡(X; ⇤ )
                                          w0 (D, X;     )=
                                                                   p 1 ⇡(X; ⇤ )
                                                              D
                                                 w1 (D) =
                                                              p
                                                               38


    Note that the result shows that the estimator attains the semiparametric efficient lower bound,
since the asymptotic variance equals the second moment of the efficient influence function of
F  Y0t |D=1 ( ).
    Now, I will present a central limit theorem result, based upon a similar result in Callaway and
Li (2019), which I will use to later establish the limiting behavior of the QTT estimator. Note the
following result:
Proposition 1. Suppose assumptions ID.1-ID.5, and Assumption NP.1 holds. Then,
                           ⇣                                                                                                                                           ⌘
                             Ĝ   Y0t |D=1 , Ĝ   Yt                              1 |D=1
                                                                                          , ĜY0t |D=1 , ĜYt |D=1 , ĜYt      1 |D=1
                                                                                                                                      , ĜYt                    2 |D=1
                              d
                           !(           1,     2,                                  0,      1,      3,      4)
In the space S = l1 ( Y0t|D=1 ) ⇥ l1 ( Yt                                               1|D=1 ) ⇥ l
                                                                                                      1
                                                                                                        (Y0t|D=1 ) ⇥ l1 (Yt|D=1 ) ⇥ l1 (Yt                                1|D=1 ) ⇥ l
                                                                                                                                                                                      1
                                                                                                                                                                                        (Yt 2|D=1 )
                                                                                                                                                                                            0
where (       1, 2,    0,      1,      3,     4 ) is a tight Gaussian process with mean 0 and covariance V(y , y)                                                                                =
E[⌘(y)0 ⌘(y)] for y = (y1 , y2 , y3 , y4 , y5 , y6 ) 2 S and with ⌘(y) given by
                                                     2                                                                                  3
                                                     666 (D, p(X), P( Y  y|X); p) 777
                                                       666                                                       0t                     777
                                                         666                                                                              777
                                                           666 D { Y  y } F                                                                77
                                                             666 p                             t 1        2            Yt 1 |D=1 (y2 )7      777
                                                               666                                                                             777
                                                                 666 D {Ỹ  y } F                                                               777
                                                                   666 p                         t       3          Y0t |D=1 (y3 )                 777
                                         ⌘(y) = 66                                                                                                   777
                                                                     666 D
                                                                       666 p {Yt  y4 } FYt |D=1 (y4 ) 77777
                                                                         666                                                                           777
                                                                           666 D {Y  y } F                                                              77
                                                                             666 p            t 1         5          Yt 1 |D=1 (y5 ) 7                    777
                                                                               666                                                                          777
                                                                                 4 D {Y  y } F                                 (y ) 5
                                                                                      p       t 2         6          Yt 2 |D=1     6
where
                                             p ⇣                                                                             ⌘
                    Ĝ   Y0t |D=1 ( ) =         n F̂                                  Y0t |D=1 ( )        F   Y0t |D=1   (y)
                                  1
                    Ỹit = F      Y0t |D=1    F Yt 1 |D=1 ( Yit 1 ) + F Y1t 1 |D=1 F                                                 Yt 2 |D=1 (                     Yit 2 )
                                            1 X
                    F̃Y0t |D=1 (y) =                                                  {Ỹit  y}
                                           nD i2D
                                           p ⇣                                                                          ⌘
                    G̃Y0t |D=1 (y) = n F̃Y0t |D=1 (y) FY0t |D=1 (y)
    Proposition SA2 and Theorem SA1 still hold from Callaway and Li (2019). I reproduce them
here, in order to account for the change in notation and numbering of the assumptions, and the
                                                                                                      39


technical fact that the estimator of F                  Y0t |D=1 (   ) has changed. Proposition 1 is used to establish the
result in Proposition 2.
                                        p                                                                 p
Proposition 2. Let Ĝ0 (y) =               n(F̂Y0t |D=1 (y)) FY0t |D=1 (y)) and let Ĝ1 (y) =               n(F̂Yt |D=1 (y)) FYt |D=1 (y)).
Suppose assumptions ID.1-ID.5, and Assumption NP.1 hold. Then,
                                                                        d
                                                         (Ĝ0 , Ĝ1 ) ! (      0,    1)
where      0   and     1 are tight Gaussian processes with mean 0 with almost surely uniformly contin-
uous paths on the space Y0t|D=1 ⇥ Yt|D=1 given by
                                                                      1 =    1
and
    0  =
         Z                                                                                                                                !
                                                               1                            4          1   K1 (v)
    0 +           1    K2 (y, v)      f  Y0t |D=1   y      F Yt  1 |D=1
                                                                           FYt 2 |D=1                                     2      K3 (y, v)
                                                                                           fYt 1 |D=1
                                                                                                          K1 (v)
      f Yt 1 |Yt 2,D=1 (K3 (y, v)|v)
  ⇥                                     dFYt    2 |D=1
                                                       (v),
         f  Yt 1 |D=1 (K3 (y, v))
where K1 (v) B FYt1 1 |D=1 FYt          2 |D=1
                                               (v), K2 (y, v) B y K1 (v), and K3 (y, v) B F                  1
                                                                                                             Yt 1 |D=1  F   Y0t |D=1 (K2 (y, v))
   The above proposition is then used to establish the limiting behavior of the QTT estimator.
Theorem 6. Suppose FY0t |D=1 admits a positive continuous density fY0t |D=1 on an interval [a, b]
containing an ✏-enlargement of the set {FY0t1|D=1 (⌧) : ⌧ 2 T }. Suppose assumptions ID.1-ID.5, and
Assumption NP.1 hold. Then,
                                        p                                      d
                                               [
                                            n(QT    T (⌧)       QT T (⌧)) ! Ḡ1 (⌧)          Ḡ0 (⌧)
where (Ḡ0 (⌧), Ḡ1 (⌧)) is a stochastic process in the metric space (`1 (T ))2 with
                                  G0 (FY0t1|D=1 (⌧))                                               G1 (FYt1|D=1 (⌧))
                    Ḡ0 (⌧) =                                                       Ḡ1 (⌧) =
                                fY0t |D=1 (FY0t1|D=1 (⌧))                                        fYt |D=1 (FYt1|D=1 (⌧))
                                                                        40


     The theorem is unchanged from Callaway and Li (2019), since the asymptotic distribution of
            p                                                             p
Ĝ0 (y) = n(F̂Y0t |D=1 (y)) FY0t |D=1 (y)) and Ĝ1 (y) = n(F̂Yt |D=1 (y)) FYt |D=1 (y)) is unchanged by the
doubly-robust estimator of F            Y0t |D=1 (y).   This is analogous to a doubly-robust estimator having the
same distribution as other semiparametric two-step estimators 4.
2.5     The Bootstrap
     The standard errors in the application are based upon the empirical bootstrap procedure. I as-
sume that for the bootstrapped nuisance functions, denoted by *, the following assumption holds,
Assumption B.1.
                                                                                                          1/4
                                                               sup|ˆ⇡⇤ (x)         ⇡ˆ (x)| = o p⇤ (n           )
                                                                x2X
                                                                                                          1/4
                                sup||P̂⇤ ( Y0t  |x)                P̂( Y0t  |x)|| = o p⇤ (n                  )
                                x2X
     This assumption gives the minimum rate of convergence of the di↵erence between the boot-
strapped estimator and the original estimator as they tend to zero. By an argument outlined in the
appendix, the double-robustness property does not reduce the necessary rate of convergence to
achieve asymptotic normality when applying the empirical bootstrap. The argument relies upon
an expansion of the efficient influence function of the bootstrapped estimate of F                                   Y0t |D=1 (y) for the
bootstrapped sample around the original estimates of the nuisance functions. When the expansion
using the full sample is around the true functions, the expected value of the pathwise derivatives
with respect to the nuisance functions equal zero. This no longer holds at estimates of the true func-
tions. This is analogous to the elimination of the bias term with kernel estimates of nuisance func-
tions when considering the asymptotic behavior of double-robust estimators, as shown in Rothe
and Firpo (2019). I then obtain the following proposition,
     This result then establishes the following proposition,
Proposition 3. Suppose assumptions ID.1-ID.5, and Assumptions NP.1 and B.1 hold. Then,
                            ⇣                                                                                      ⌘
                             Ĝ⇤ Y0t |D=1 , Ĝ⇤ Yt 1 |D=1
                                                          , Ĝ⇤Y0t |D=1 , Ĝ⇤Yt |D=1 , Ĝ⇤Yt 1 |D=1
                                                                                                    , Ĝ⇤Yt 2 |D=1
     4
       See the introduction of Rothe and Firpo (2019)
                                                                      41


                           d
                          !(         1,   2,                                 0,     1,      3,      4)
In the space S = l1 ( Y0t|D=1 ) ⇥ l1 ( Yt                                         1|D=1 ) ⇥ l
                                                                                               1
                                                                                                 (Y0t|D=1 ) ⇥ l1 (Yt|D=1 ) ⇥ l1 (Yt                  1|D=1 ) ⇥ l
                                                                                                                                                                 1
                                                                                                                                                                   (Yt 2|D=1 )
                                                                                                                                                                       0
where (     1,   2,    0,   1,      3,   4 ) is a tight Gaussian process with mean 0 and covariance V(y , y)                                                                =
E[⌘(y)0 ⌘(y)] for y = (y1 , y2 , y3 , y4 , y5 , y6 ) 2 S and with ⌘(y) given by
                                               2                                                                             3
                                               666 (D, p(X), P( Y  y|X); p 777
                                                 666                                                     0t                  777
                                                   666                                                                         777
                                                     666 D { Y  y } F                                                           77
                                                       666 p                            t 1        2         Yt 1 |D=1 (y2 )7     777
                                                         666                                                                        777
                                                           666 D {Ỹ  y } F                                                          777
                                                             666 p                        t       3       Y0t |D=1 (y3 )                777
                                      ⌘(y) = 66                                                                                           777
                                                               666 D
                                                                 666 p {Yt  y4 } FYt |D=1 (y4 ) 77777
                                                                   666                                                                      777
                                                                     666 D {Y  y } F                                (y   )                   77
                                                                       666 p           t 1         5       Yt 1 |D=1 5 7                       777
                                                                         666                                                                     777
                                                                           4 D {Y  y } F                            (y ) 5
                                                                                p      t 2         6       Yt 2 |D=1    6
where * denotes the bootstrap analogue.
    Using the above proposition, I then obtain the asymptotic behavior of the bootstrapped process.
Theorem 7. Under Assumptions ID.1-ID.5, B.1-B.2, and either Assumptions P.1-P.3, or Assump-
tions NP.1-NP.7 and C.1-C.4,
                                    p                                                                  d
                                         [
                                       n(QT   T (⌧)⇤                                   [
                                                                                       QT    T (⌧)) ! Ḡ1 (⌧)           Ḡ0 (⌧)
where (Ḡ0 (⌧), Ḡ1 (⌧)) is a stochastic process in the metric space (`1 (T ))2 with
                               G0 (FY0t1|D=1 (⌧))                                                                           G1 (FYt1|D=1 (⌧))
                 Ḡ0 (⌧) =                                                                                  Ḡ1 (⌧) =
                             fY0t |D=1 (FY0t1|D=1 (⌧))                                                                    fYt |D=1 (FYt1|D=1 (⌧))
where (G0 (⌧), G1 (⌧)) are as in Proposition 2.
2.6   Simulations
    In this section I will present the estimation of the QTT at ⌧ 2 [0.2, 0.8], in increments of 0.02.
The goal is to demonstrate not only how my estimator performs in small samples, but also how it
compares to the estimator of Callaway and Li (2019). The data generating process is as follows: I
generate the following data generating process with N = 1000 and T = 3 for 200 iterations.
            v ⇠ Normal(0, 1)
                                                                                               42


        ⌘|D = 0 ⇠ Norm(0, 1)                                           ⌘|D = 1 ⇠ Normal(1, 1)
              X1 ⇠ Uni f orm(0, 1)                                          X2 ⇠ Uni f orm( 1, 0)
              X3 ⇠ Uni f orm( 2, 1)                                         X4 ⇠ Uni f orm( 1, 0)
             Yt   2 = 0.25X1 + 0.5X2 + 0.75X3 + X4 + ⌘ + vt      2
             Yt   1 = 1 + 0.5X1 + 0.75X2 + X3 + 1.5X4 + ⌘ + vt      1
              Y0t = 2 + 0.25X1 + 0.5X2 + 0.75X3 + X4 + ⌘ + vt
              Y1t = 1.5X1 + X2 + 1.5X3 + X4 + ⌘ + vt
                Yt = D ⇥ Y1t + (1     D) ⇥ Y0t
                        eX
         p(X, ) =                                                               = [ 0.25, 0.5, 0.75, 1]
                      1+e   X
The data generating process is based upon Example 3 in Callaway and Li (2019). The distribution
of the covariates is chosen so that, for the given values of the parameters, the probability of treat-
ment and the conditional cdf of Y0t do not output observed values that are too close to 0 and 1.
This can cause numerical issues when inverting the estimators.
       The parameters of the propensity score are estimated are estimated via mle. The parameters of
the conditional cdf of Y0t , , are estimated via OLS regression of Y0t |D = 0 on Xt |D = 0. , the
standard deviation of vt , is estimated by taking the standard deviation of the vector of residuals
generated from the OLS regression.
       The following estimator is applied,
F̂ Y0t |D=1 (    )
     2 n                         3 1 n                       "                       !#                      !
     666X p (x , ˆ ) (1 D ) 777 X        p (xi , ˆ ) (1 Di )                    ˆ xi          Xn
                                                                                                        ˆ xi
 = 664 6
       6        ⇥
                    i          i 7
                              ⇤ 77        ⇥              ⇤     {Yi  }                  + nD1     Di
                 1 p (xi , ˆ ) 75          1 p (xi , ˆ )                      ˆ                       ˆ
         i=1                         i=1                                                      i=1
where         (·) is the standard normal cdf, and ˆ , ˆ , and ˆ are the aforementioned estimates of the
                                                           43


nuisance parameters. Note that this estimator normalizes the weights of the first term. This cre-
ates a normalized augmented inverse probability weighting estimator. This adds an asymptotically
negligible normalization constant while improving the small sample performance of the estimator5.
    Here, misspecification of the propensity score is when the propensity score is chosen to be the
standard normal cdf. Misspecification of the cdf nuisance function is considered when the function
is chosen to be the logistic(0,1) cdf. In either case, each function resembles the true function over
the support of the true underlying random variable that each function is based upon, but the mis-
specification is most pronounced in the tails. This misspecification is not far from the truth over
what a researcher might observe based upon their data.
    The estimator that I propose in this chapter outperforms the Callaway and Li estimator, at least
under the data generating process that I used. Figure F.1 shows that under all of the scenarios
involving misspecification of the nuisance functions that I considered, there is a similar average
absolute bias across the quantile estimates; however, as shown in Figure F.2, the root-mean-square
error (RMSE) under these same scenarios the Callaway and Li estimator has an RMSE of ap-
proximately 1.5, regardless of the quantile. Note that these figures are divided into four scenarios,
comparing the various forms of misspecification under the doubly-robust estimator to the the Call-
away and Li estimator.
2.7    Application
    In this section, I use my method to study the e↵ect of state-level changes in the minimum wage
on county-level unemployment rates. The purpose of this application is to demonstrate how the
standard errors of the estimates using my estimator compare to the standard errors of the estimates
using the estimator of Callaway and Li (2019), and this application is based upon the application
within that paper. Variations in state-level changes in minimum wage laws are exploited alongside
variations in county-level observable characteristics, such as di↵erences in population and median
income. The goal is to examine the change in the distribution of county-level unemployment rates
    5
      For a discussion of the importance of normalization of inverse probability estimators, though not in the context
of double-robustness, see Słoczyński, Uysal, and Wooldridge (2022). The main benefit of this normalization is a
reduction in the small-sample bias of the estimator.
                                                         44


due to an increase in the minimum wage, and compare that to the distribution of unemployment
rates had there been no change in the minimum wage.
    The dataset that is used is taken from the replication materials of Callaway and Li (2019).
They examine a period during which there was variation in state-level minimum wages, but the
U.S. federal minimum wage remained flat until the end of the period. The outcome variable is the
county-level unemployment rate, which they obtain from the Local Area Unemployment Statis-
tics Database from the Bureau of Labor Statistics. County-level unemployment rates are available
monthly, and they choose to use the February unemployment rates from 2005-2007, a month that
they felt to be sufficiently far away from the federal wage change in July 2007. They merge county
characteristics, the 1997 county median income and the 2000 county population, from the 2000
County Data Book 6. The treatment group consists of counties in states, 11 states total exclud-
ing counties from states in the northeast, that increased their minimum wage by the first quarter
of 2007. Counties in 20 states that did not increase their minimum wage by July 2007 form the
control group.
    The nuisance parameters are estimated parametrically. I assumed a logit specification for the
propensity score, with the covariates chosen to be the natural log of county population, the natural
log of the median county income, the squares of these terms, their interaction, factor variables
for the South and West census regions, and the interaction of the factor variables with the other
covariates. I assumed a probit specification for the conditional cdf of Y0t , with the parameters
estimated using ordinary least squares and assuming homoskedastic errors.
    Due to the simultaneous estimation of parameters and construction of confidence intervals,
I will construct the confidence intervals as in Callaway and Li (2019). I outline the steps as an
algorithm below:
    1. For each ⌧ 2 T , calculate
                                      ˆ 1/2 = (q0.75 (⌧)
                                      ⌃(⌧)                      q0.25 (⌧))/(z0.75   z0.25 )
    6
      The replication materials can be found at https://onlinelibrary.wiley.com/doi/full/10.3982/QE935 .
                                                           45


       This is equal to the bootstrap interquartile range divided by the interquartile range of a
       standard normal random variable, where ⌃(⌧)   ˆ   is an estimate of the asymptotic variance of
       [
       QT  T (⌧).
   2. For bootstrap iterations b = 1, ..., B, calculate,
                                                       p
                                             ˆ
                                I b = sup⌧2T ⌃(⌧) 1/2     [
                                                      | n(QT   T (⌧)b [
                                                                      QT T (⌧))|
   3. Calculate c1B ↵ , which is the (1   ↵) quantile of {I b }b=1
                                                               B
                                                                   .
                                           p
   4. Calculate QT[                ˆ 1/2 / n
                     T (⌧) ± c1B ↵ ⌃(⌧)
    Figure F.3 shows that when comparing the estimates, there is little di↵erence between my es-
timator and the Callaway and Li (2019) estimator. The point estimators are close, except at the
90th quantile, and the confidence intervals based upon my estimator are only slightly narrower.
This is more revealing than it may seem. My simulations would suggest a sharp reduction in the
standard error of the estimates when applying my estimator, but this is only when the nuisance
function estimates are sufficiently close to the truth. My estimator suggests that there is a strong
misspecification of the conditional cdf of Y0t .
2.8    Conclusion
    I have provided a doubly-robust estimator of the quantile treatment e↵ect on the treated. This
estimator relaxes the assumptions on the nuisance functions, allowing for a slower rate of conver-
gence in order to achieve the limiting distribution of the QTT estimator. This causes nonparametric
estimation of each of the nuisance functions to be much more viable, since nonparametric estima-
tion requires assumptions upon the di↵erentiable of the nuisance functions, which in turn a↵ects
the rate of convergence. As my simulations demonstrate, this leads to a lower RMSE in small sam-
ples, particularly when estimating the QTT at the median. Without the double-robustness property,
confidence intervals could be so large that the QTT is not statistically di↵erent from 0 except at
the extremes of the distribution of the di↵erence in treated and untreated outcomes for the treated
subpopulation.
                                                   46


    It is important to recognize what this estimator is not. It is not a substitute for the doubly-robust
estimator of the ATT that is presented in Sant’Anna and Zhao (2020). The assumptions necessary
for identification are relaxed. There is no conditional copula assumption, and the parallel trends
assumption is weaker than the conditional independence assumption on the di↵erence in untreated
outcomes. Instead, the two estimators should be used to complement each other. The ATT should
be estimated along with quantile treatment e↵ects on the treated at a variety of quantiles. If the
estimate of the ATT is inconsistent with the results that are being presented across the information
that is summarized by the QTTs, then perhaps either the conditional copula assumption or the
conditional independence assumption does not hold.
    What this estimator should be seen as is part of a middle ground between some of the more
nonparametric estimators and estimators that rely entirely upon propensity score matching. In par-
ticular, the optimal transport methods of Gunsilius and Xu (2021) and Torous, Gunsilius, and
Rigollet (2021) avoid the curse of dimensionality that is common with nonparametric estimation
of the propensity score when estimating treatment e↵ects; however, a doubly-robust estimator will
relax the smoothness assumptions on the propensity score function in relation to the dimension
of the covariate matrix. When supplemented with other doubly-robust estimators in the causal in-
ference literature, my QTT estimator becomes part of a battery of doubly-robust estimators that
increase the feasibility of propensity score matching.
                                                  47


                                             CHAPTER 3
 APPLICATION OF ADDITIONAL MOMENTS TO QUANTILE TREATMENT EFFECT
                                 ESTIMATION: A SIMULATION
3.1     Introduction
    In Chapter 2, I presented a doubly-robust estimator of the quantile treatment e↵ect on the treated
(QTT). This estimator is robust to misspecification of either the propensity score for the probability
of treatment or the conditional cdf of Y0t . In this chapter, I will run a simulation to compare the
doubly-robust estimator of Chapter 2 with an estimator that will rely upon an overidentified system
of moment equations to estimate the parameters of the nuisance functions. In other words, though
the estimator in Chapter 2 is doubly-robust, it might not lead to vastly improved performance
over an estimator that relies upon additional moments to estimate the QTT. An additional moment
condition could be applied to estimate the parameters of the nuisance functions, and then either
choice of nuisance function could be plugged in to estimate the QTT.
3.1.1     Structure of the Chapter
    This chapter is structured as follows. Section 3.2 lays out the assumptions that were contained
in Chapter 2. In particular, I note that the estimator in this chapter relies upon assumption ID.4 of
Chapter 2 to create an overidentified GMM estimator. I also explain how adding additional mo-
ments can eventually lead to a QTT estimator that approximates the performance of the estimator
in Chapter 2. Section 3.3 contains simulations and interpretations of those simulations. Section 3.4
concludes the chapter.
3.2     Identification
    Recall the following assumptions from Chapter 2,
Assumption ID.1. The observed data {Yit , Yit 1 , Yit 2 , Xi , Di }ni=1 are independent and identically dis-
tributed draws from the joint distribution FYt ,Yt 1 ,Yt 2 ,X,D . In addition, Yit = Di Y1it +(1 Di )Y0it , Yit  1 =
Y0it 1 , Yit 2 = Y0it 2 .
Assumption ID.2. Each of the random variables Yt for the treated group and Yt 1 , Yt 1 , Yt                    2 for
                                                      48


the treated group are continuously distributed on their support with densities that are uniformly
bounded from above and bounded away from 0.
Assumption ID.3. p B P(D = 1) > 0 and, for all x 2 X, p(x) B P(D = 1|X = x) < 1.
Assumption ID.4.               Y0t ?? D|X
     The last assumption is known as the "Copula Stability Assumption"
Assumption ID.5 (Copula Stability Assumption). C                                  Y0t ,Y0t 1 |D=1,X (·, ·) =C     Y0t 1 ,Y0t 2 |D=1,X (·, ·)
     As shown in Chapter 2, these assumptions can identify the QT T (⌧). Now, consider the follow-
ing moment condition
                                 "                    !                                  !                     #
                                    1    D    p(X)                                   D
                            E                             { Yt  y}                        P( Y0t  y|X) = 0                                 (3.1)
                                       p    1 p(X)                                    p
Note that unlike in the previous chapter, I am assuming the correct specification of the nuisance
functions p(X) and P( Y0t  y|X). In other words, I am using information from the treated and un-
treated observations, but not for the purposes of robustness. This information will be used to create
an overidentified generalized method of moments (GMM) estimator, which will then be used to
obtain QT T (⌧). The following system of moments are used to estimate F                                          Y0t |D=1 (y).     This estimator
would then be used to obtain the QT T (⌧) as in Chapter 2. The system of moment conditions would
be as follows:
                                                                   2                                                     3
                                                                   666                                                   7
                                                                     666 i (p(X; ✓), P( Y0t  y|X; ))77777
                                                                       666                                               777
                            i (p(X;    ✓), P(  Y0t   y|X;   )) =        6
                                                                         6
                                                                         666                   s p(x)i (✓)                 777               (3.2)
                                                                           666                                               777
                                                                             64                                                777
                                                                                           sP( Y0t |X)i ( )                      5
where
                                                                                  !                          !
                                                    1     D    p(X)                                        D
           i (p(X; ✓), P(     Y0t  y|X; )) =                                          { Yt  y}               P( Y0t  y|X).                (3.3)
                                                        p   1 p(X)                                         p
s p(x)i (✓) and sP(   Y0t |X)i (   ) denote the first order conditions that are used to estimate the parameters
of the nuisance functions. The estimator QTˆ T (⌧) is then,
                                            QTˆ T (⌧) = F̂Y1t|D=1 (⌧)           1
                                                                                      F̂Y0t|D=1 (⌧)     1
                                                                49


where
                                            F̂Y1t1|D=1 (⌧) = in f {y : F̂Y1t |D=1 (y)            ⌧}
                                            F̂Y0t1|D=1 (⌧) = in f {y : F̂Y0t |D=1 (y)            ⌧}
                 F̂Y0t |D=1 (y)
                          X
                  = nD1        [ {F̂   1
                                       Y0t |D=1 ( F̂ Yt 1 |D=1 (        Yit 1 ))  y    F̂Yt1 1 |D=1 (F̂Yt   2 |D=1
                                                                                                                      (Yit 2 ))}]
                           i2D
where nD denotes the number of treated observations and
                                                          n 6B
                                                              20                             1                     3
                                                        X     666BBB 1 Di ⇡ˆ (xi ) CCCC                            777
                             F̂ Y0t |D=1 (y) = n      1         6    B
                                                                666BBB Pn                    CCC { Yit  y}77777
                                                                                               C                                    (3.4)
                                                                  4@ k=1 Dk 1 ⇡ˆ (xi ) A                             5
                                                         i=1               n
In invoking this moment condition, I am assuming that assumption ID.4 holds. It is this assump-
tion that allows for reweighting of either the propensity score or the cdf nuisance function. This
assumption not only ensures that the moment condition holds, but it also is essential for inverting
F  Y0t |D=1 (y). For example, suppose that
                                                    !                                !
                             1 D p(X)                                             D
                                                         { Yt  0.5}                   P( Y0t  0.5|X) = 0
                                p 1 p(X)                                          p
but
                                                       !                             !
                                1     D     p(X)                                  D
                                                            { Yt  y}                  P( Y0t  y|X) , 0
                                  p     1 p(X)                                    p
                                              1
for y , 0. Then the inversion F               Y0t |D=1 ( F̂ Yt 1 |D=1 (      Yit 1 )) would be incorrect, by the proof of The-
orem 1 in Chapter 2.
    The inclusion of this additional moment condition should not be expected to improve the per-
formance of the estimator over the estimator in Chapter 2. What could be expected is that as more
moment conditions are added, the performance of the estimate gets closer to estimator in Chapter
2. This is because all of the efficient information that is used communicated through the double-
robustness property in Chapter 2 is only partially communicated by additional moment functions.
To see this, recall that the doubly-robust moment representation of F                                   Y0t |D=1 (y)    is:
                           "                         !                                                          !                 #
                              1 D ⇡(X)                                          1 D ⇡(X)                  D
    F Y0t |D=1 (y) = E                                    { Yt  y}                                               P̃( Y0t  y|X)    (3.5)
                                 p 1 ⇡(X)                                           p 1 ⇡(X)               p
                                                                         50


Now suppose that we have the following system of moments
                                     2                                        3
                                     666                                      7
                                       666 i (p(X; ✓), P( Y0t  y1 |X; ))77777
                                         666                                  7
                                           666 (p(X; ✓), P( Y  y |X; ))77777
                                             666 i                  0t    2   777
                                               666                              777                             (3.6)
                                                 666       s p(x)i (✓)            777
                                                   666                              777
                                                     666                              777
                                                       4 sP( Y0t |X)i ( )               5
where y1 , y2 . A GMM estimator that uses these moments is going to be the most efficient estima-
tor under these moment conditions; however, the first two moment conditions do not communicate
more information than the information communicated through (5) about assumption ID.4. As more
moment conditions are added, then the information communicated through all conditions that take
the form of (3) approaches the information communicated through (5).
3.3   Simulations
    In this section I will present the estimation of the QTT at ⌧ 2 [0.2, 0.8], in increments of 0.02.
The goal is to demonstrate not only how my estimator performs in small samples, but also how it
compares to the estimator of Callaway and Li (2019). The data generating process is as follows: I
generate the following data generating process with N = 1000 and T = 3 for 200 iterations.
            v ⇠ Normal(0, 1)
    ⌘|D = 0 ⇠ Norm(0, 1)                                                    ⌘|D = 1 ⇠ Normal(1, 1)
          X1 ⇠ Uni f orm(0, 1)                                                            X2 ⇠ Uni f orm( 1, 0)
          X3 ⇠ Uni f orm( 2, 1)                                                           X4 ⇠ Uni f orm( 1, 0)
         Yt 2 = 0.25X1 + 0.5X2 + 0.75X3 + X4 + ⌘ + vt                   2
         Yt 1 = 1 + 0.5X1 + 0.75X2 + X3 + 1.5X4 + ⌘ + vt                  1
          Y0t = 2 + 0.25X1 + 0.5X2 + 0.75X3 + X4 + ⌘ + vt
          Y1t = 1.5X1 + X2 + 1.5X3 + X4 + ⌘ + vt
                                                              51


           Yt = D ⇥ Y1t + (1           D) ⇥ Y0t
                   eX
     p(X, ) =                                                                        = [ 0.25, 0.5, 0.75, 1]
                 1+e     X
The data generating process is based upon Example 3 in Callaway and Li (2019). The distribution
of the covariates is chosen so that, for the given values of the parameters, the probability of treat-
ment and the conditional cdf of Y0t do not output observed values that are too close to 0 and 1.
This can cause numerical issues when inverting the estimators.
    All parameters are estimated by using an overidentified GMM estimator. The moment condi-
tions include the score functions taken from a logit mle. These correspond to estimation of the
propensity score. OLS moment conditions are also included and correspond to estimation of the
parameters of Y0t |X, , and the standard deviation of v0t , . In addition, moment condition (3.1)
is included at the point where y = 0.5.
    The following estimator is applied,
                                   2 n                        3 n
                                   666X p (xi , ˆ ) (1 Di ) 777 1 X   p (xi , ˆ ) (1 Di )
               F̂ Y0t |D=1 ( ) = 664       ⇥             ⇤ 775         ⇥              ⇤ [ { Yi  }]
                                      i=1
                                            1 p (xi , ˆ )         i=1
                                                                        1 p (xi , ˆ )
As in Chapter 2, misspecification of the propensity score is when the propensity score is chosen
to be the standard normal cdf. Misspecification of the cdf nuisance function is considered when
the function is chosen to be the logistic(0,1) cdf. The results of the simulation are presented in
Figures G.1 and G.2 in the appendix. Figures G.1 and G.2 are divided into three scenarios. In
the first scenario, I am comparing the estimator when all nuisance are correctly specified to the
estimator when only the cdf nuisance function correctly specified. In the second scenario, I am
comparing the estimator when the propensity score is correctly specified, but the cdf nuisance
function is misspecified, to the estimator when both nuisance functions are misspecified. In the
third scenario, I am comparing the estimator when only the cdf nuisance function is correctly
specified to the estimator when neither nuisance function is correctly specified. It is important to
note the following: First, note that as in Chapter 2 the bias of the estimates is small relative to
the true values of the quantile treatment e↵ects. Second, if a comparison is made across figures to
                                                           52


Figure F.2, the root mean square error (RMSE) is considerably smaller compared to the Callaway
and Li estimator, but larger than the doubly-robust estimators in Chapter 2. Assuming correct
specification of all of the moments, the RMSE is larger compared to any doubly-robust estimator
under the misspecification considered in Chapter 2. Even allowing for misspecification, the GMM-
based estimates in this chapter yield a RMSE that is slightly greater than half the size of the RMSE
of the Callaway and Li estimators in Chapter 2.
3.4   Conclusion
    The GMM-based estimator of this chapter do not seem to perform as well as the doubly-robust
estimator of Chapter 2, but it’s performance under the mild misspecification that is considered is
favorable compared to the Callaway and Li estimator. It is not surprising that a lower RMSE than
the estimator in Chapter 2 is not reached. The doubly-robust estimator of Chapter 2 takes advantage
of the semiparametrically efficient estimator of F( Y0t |D = 1). Still, the simulation performance
of the GMM-based estimator is an improvement upon the Callaway and Li estimator.
                                                 53


                                         BIBLIOGRAPHY
Abadie, Alberto (2005). “Semiparametric di↵erence-in-di↵erences estimators”. In: The Review of
     Economic Studies 72.1, pp. 1–19.
Andrews, Frank M, Antonia Abbey, and L Jill Halman (1991). “Stress from infertility, marriage
     factors, and subjective well-being of wives and husbands”. In: Journal of health and social
     behavior, pp. 238–253.
Athey, Susan and Guido W Imbens (2006). “Identification and inference in nonlinear di↵erence-
     in-di↵erences models”. In: Econometrica 74.2, pp. 431–497.
Blundell, Richard and James L Powell (2001). “Endogeneity in nonparametric and semiparametric
     regression models”. In:
Bonhomme, Stéphane and Ulrich Sauder (2011). “Recovering distributions in di↵erence-in-di↵erences
     models: A comparison of selective and comprehensive schooling”. In: Review of Economics
     and Statistics 93.2, pp. 479–494.
Callaway, Brantly and Tong Li (2019). “Quantile treatment e↵ects in di↵erence in di↵erences
     models with panel data”. In: Quantitative Economics 10.4, pp. 1579–1618.
Callaway, Brantly and Pedro HC Sant’Anna (2021). “Di↵erence-in-di↵erences with multiple time
     periods”. In: Journal of Econometrics 225.2, pp. 200–230.
Cameron, A Colin, Tong Li, Pravin K Trivedi, and David M Zimmer (2004). “Modelling the dif-
     ferences in counted outcomes using bivariate copula models with application to mismeasured
     counts”. In: The Econometrics Journal 7.2, pp. 566–584.
Caracciolo, Francesco and Marilena Furno (2017). “Quantile treatment e↵ect and double robust
     estimators: an appraisal on the Italian labor market”. In: Journal of Economic Studies.
Card, David (1990). “The impact of the Mariel boatlift on the Miami labor market”. In: ILR Review
     43.2, pp. 245–257.
Card, David and Alan Krueger (1994). Minimum Wages and Employment: A Case Study of the
     Fast-Food Industry in New Jersey and Pennsylvania.
Chamberlain, Gary (1980). “Analysis of Covariance with Qualitative Data”. In: The Review of
     Economic Studies 47.1, pp. 225–238.
Chen, Xiaohong (2007). “Large sample sieve estimation of semi-nonparametric models”. In: Hand-
     book of econometrics 6, pp. 5549–5632.
                                                 54


Chen, Xiaohong and Xiaotong Shen (1998). “Sieve extremum estimates for weakly dependent
     data”. In: Econometrica, pp. 289–314.
Chernozhukov, Victor, Iván Fernández-Val, Jinyong Hahn, and Whitney Newey (2013). “Average
     and quantile e↵ects in nonseparable panel models”. In: Econometrica 81.2, pp. 535–580.
Chernozhukov, Victor, Juan Carlos Escanciano, Hidehiko Ichimura, Whitney K Newey, and James
     M Robins (2016). “Locally robust semiparametric estimation”. In: arXiv preprint arXiv:1608.00033.
Fan, Jianqing, Kosuke Imai, Han Liu, Yang Ning, and Xiaolin Yang (2016). Improving covari-
     ate balancing propensity score: A doubly robust and efficient approach. Tech. rep. Technical
     report, Princeton University.
Firpo, Sergio (2007). “Efficient semiparametric estimation of quantile treatment e↵ects”. In: Econo-
     metrica 75.1, pp. 259–276.
Gourieroux, Christian, Alain Monfort, Eric Renault, and Alain Trognon (1987). “Generalised
     residuals”. In: Journal of econometrics 34.1-2, pp. 5–32.
Gunsilius, Florian and Yuliang Xu (2021). “Matching for causal e↵ects via multimarginal optimal
     transport”. In: arXiv preprint arXiv:2112.04398.
Hajivassiliou, Vassilis A and Paul A Ruud (1994). “Classical estimation methods for LDV models
     using simulation”. In: Handbook of econometrics 4, pp. 2383–2441.
Heckman, James J (1979). “Sample selection bias as a specification error”. In: Econometrica:
     Journal of the econometric society, pp. 153–161.
Heckman, James J, Hidehiko Ichimura, and Petra Todd (1998). “Matching as an econometric eval-
     uation estimator”. In: The review of economic studies 65.2, pp. 261–294.
Heckman, James J, Hidehiko Ichimura, Je↵rey A Smith, and Petra E Todd (1998). Characterizing
     selection bias using experimental data.
Hirano, Keisuke, Guido W Imbens, and Geert Ridder (2003). “Efficient estimation of average
     treatment e↵ects using the estimated propensity score”. In: Econometrica 71.4, pp. 1161–
     1189.
Horvitz, Daniel G and Donovan J Thompson (1952). “A generalization of sampling without re-
     placement from a finite universe”. In: Journal of the American statistical Association 47.260,
     pp. 663–685.
Levinger, George George Klaus, Oliver C Moles, et al. (1979). Divorce and separation. Basic
     Books.
                                                55


Li, Qi and Je↵rey S Racine (2008). “Nonparametric estimation of conditional CDF and quantile
     functions with mixed categorical and continuous data”. In: Journal of Business & Economic
     Statistics 26.4, pp. 423–434.
Li, Qi and Je↵rey Scott Racine (2007). Nonparametric econometrics: theory and practice. Prince-
     ton University Press.
Lin, Wei and Je↵rey M Wooldridge (2017). Binary and fractional response models with contin-
     uous and binary endogenous explanatory variables. Tech. rep. Working paper, available at
     http://www. weilinme trics. com/uploads/5/1/4/0 ?
Liu, Chien (2000). “A theory of marital sexual life”. In: Journal of Marriage and Family 62.2,
     pp. 363–374.
Lorentz, GG (1966). Approximation of Functions, Athena Series. Holt, Rinehart and Winston, New
     York.
Machado, José AF and José Mata (2005). “Counterfactual decomposition of changes in wage dis-
     tributions using quantile regression”. In: Journal of applied Econometrics 20.4, pp. 445–465.
Masry, Elias (1996). “Multivariate local polynomial regression for time series: uniform strong
     consistency and rates”. In: Journal of Time Series Analysis 17.6, pp. 571–599.
Mullahy, John (2015). “Multivariate fractional regression estimation of econometric share mod-
     els”. In: Journal of Econometric Methods 4.1, pp. 71–100.
Muris, Chris (2020). “Efficient GMM estimation with incomplete data”. In: Review of Economics
     and Statistics 102.3, pp. 518–530.
Nam, Suhyeon (2014). “Essays in multiple fractional responses with endogenous explanatory vari-
     ables”. PhD thesis. Michigan State University.
Newey, KW and Daniel McFadden (1994). “Large sample estimation and hypothesis”. In: Hand-
     book of Econometrics, IV, Edited by RF Engle and DL McFadden, pp. 2112–2245.
Newey, Whitney K (1990). “Semiparametric efficiency bounds”. In: Journal of applied economet-
     rics 5.2, pp. 99–135.
Oppenheimer, Valerie Kincade (1988). “A theory of marriage timing”. In: American journal of
     sociology 94.3, pp. 563–591.
Papke, Leslie E and Je↵rey M Wooldridge (2008). “Panel data methods for fractional response
     variables with an application to test pass rates”. In: Journal of econometrics 145.1-2, pp. 121–
     133.
                                                  56


Petrin, Amil and Kenneth Train (2010). “A control function approach to endogeneity in consumer
     choice models”. In: Journal of marketing research 47.1, pp. 3–13.
Rivers, Douglas and Quang Vuong (2002). “Model selection tests for nonlinear dynamic models”.
     In: The Econometrics Journal 5.1, pp. 1–39.
Rosenbaum, Paul R and Donald B Rubin (1983). “The central role of the propensity score in
     observational studies for causal e↵ects”. In: Biometrika 70.1, pp. 41–55.
Rothe, Christoph and Sergio Firpo (2013). “Semiparametric estimation and inference using doubly
     robust moment conditions”. In:
Rothe, Christoph and Sergio Firpo (2019). “Properties of doubly robust estimators when nuisance
     functions are estimated nonparametrically”. In: Econometric Theory 35.5, pp. 1048–1087.
Sant’Anna, Pedro HC and Jun Zhao (2020). “Doubly robust di↵erence-in-di↵erences estimators”.
     In: Journal of Econometrics 219.1, pp. 101–122.
Schröder, Jette and Claudia Schmiedeberg (2015). “E↵ects of relationship duration, cohabitation,
     and marriage on the frequency of intercourse in couples: Findings from German panel data”.
     In: Social Science Research 52, pp. 72–82.
Słoczyński, Tymon, S Derya Uysal, and Je↵rey M Wooldridge (2022). “Abadie’s Kappa and
     Weighting Estimators of the Local Average Treatment E↵ect”. In: arXiv preprint arXiv:2204.07672.
Słoczyński, Tymon and Je↵rey M Wooldridge (2018). “A general double robustness result for
     estimating average treatment e↵ects”. In: Econometric Theory 34.1, pp. 112–133.
Staiger, Douglas and James H Stock (1997). “Instrumental Variables Regression with Weak Instru-
     ments”. In: Econometrica 65.3, pp. 557–586.
Sued, Mariela, Marina Valdora, and Víctor Yohai (2020). “Robust doubly protected estimators for
     quantiles with missing data”. In: TEST 29.3, pp. 819–843.
Torous, William, Florian Gunsilius, and Philippe Rigollet (2021). “An Optimal Transport Ap-
     proach to Causal Inference”. In: arXiv preprint arXiv:2108.05858.
Vaart, Aad W and Jon A Wellner (1996). “Weak convergence”. In: Weak convergence and empiri-
     cal processes. Springer, pp. 16–28.
Vaart, Aad W Van der (2000). Asymptotic statistics. Vol. 3. Cambridge university press.
Walfisch, S, B Maoz, and H Antonovsky (1984). “Sexual satisfaction among middle-aged couples:
     correlation with frequency of intercourse and health status”. In: Maturitas 6.3, pp. 285–296.
                                                 57


Woodland, Alan D (1979). “Stochastic specification and the estimation of share equations”. In:
    Journal of Econometrics 10.3, pp. 361–383.
Wooldridge, Je↵rey M (2007). “Inverse probability weighted estimation for general missing data
    problems”. In: Journal of econometrics 141.2, pp. 1281–1301.
Wooldridge, Je↵rey M (2010). Econometric analysis of cross section and panel data. MIT press.
Wooldridge, Je↵rey M (2014). “Quasi-maximum likelihood estimation and testing for nonlinear
    models with endogenous explanatory variables”. In: Journal of Econometrics 182.1, pp. 226–
    234.
Yabiku, Scott T and Constance T Gager (2009). “Sexual frequency and the stability of marital and
    cohabiting unions”. In: Journal of marriage and family 71.4, pp. 983–1000.
Zeger, Scott L, Kung-Yee Liang, and Paul S Albert (1988). “Models for longitudinal data: a gen-
    eralized estimating equation approach”. In: Biometrics, pp. 1049–1060.
                                              58


                                                           APPENDIX A
                            DERIVING THE AVERAGE PARTIAL EFFECTS
Proof of Theorem 1
                                                                    2         3
                                       p                      p 66666 ˆ       777
                                                                                777
                                           N(ˆ⌘       ⌘) = N 6666                 777 = KN + o p (1)                                  (A.1)
                                                                    4✓ˆ     ✓5
                                      PN
Where KN =           B0 1 G>0 N   1/2
                                          i=1     i (✓0 , 0 ). Now, apply a mean value expansion to ⌅ˆ around ⌘.
Then,
                                       N
   p
           ˆ                   1 X                                        j                                      j
      N ⌅tl (x , h , t ) = p [ [@⇤(dtlv0 + ⇣ v uit )/@
                0   0 0
                                                                          (.) + r ,✓ @(⇤(detlv + ⇣ v uit )/@     (.) )](ˆ⌘   ⌘)]      (A.2)
                                 N i=1
where detlv is dtl evaluated at some e       ✓ between ✓ˆ and ✓0 . By (46) and the Weak Law of Large Numbers,
                                               N
                                        1 X                       v                           j
                                      p              r ,✓ @(⇤(deitl   + ⇣ v uit )/@ (.)         )](ˆ⌘ ⌘)
                                         N i=1
                                            N
                                      1X                                                  j
                                                                                                p
                                   =             r ,✓ @(⇤(detlv + ⇣ v uit )/@ (.)           )] N(⌘ˆ ⌘)
                                     N i=1
                                                                                     j
                                   =E[r ,✓ @(⇤(dtlv0 + ⇣ v uit )/@                  (.) )])KN ]   + o p (1)
                                  p                                    p
   Then after subtracting            N⌅tl (x0 , h0 , t0 ) from           N ⌅ˆtl (x0 , h0 , t0 ) the following is obtained,
   p
     N(⌅ˆtl (x0 , h0 , t0 )   ⌅tl (x0 , h0 , t0 ))
                N
        1 X                                    j                                                                    j
   = p             [@⇤(dtlv0 + ⇣ v uit )/@    (.)     ⌅tl (x0 , h0 , t0 ) + E[r ,✓ @(⇤(dtlv0 + ⇣ v uit )/@         (.) )])Ki ] + o p (1)
           N i=1
where Ki = B0 1 G>0 i (✓0 ,         0 ).  Since
                                         j                                                                  j
             E[@⇤(dtlv0 + ⇣ v uit )/@    (.)     ⌅tl (x0 , h0 , t0 ) + E[r ,✓ @(⇤(dtlv0 + ⇣ v uit )/@       (.) )])Ki ]]   =0
Then by the Central Limit Theorem,
                                  p
                                    N(⌅ˆtl (x0 , h0 , t0 )       ⌅tl (x0 , h0 , t0 )) ! N(0, Vtl )                                    (A.3)
                                                                    59


where,
Vtl = E[Vtli> Vtli ]
                              j                                                           j
Vtli = @⇤(dtlv0 + ⇣ v uit )/@ (.) ) ⌅tl (x0 , h0 , t0 ) + E([r ,✓ (@⇤(dtlv0 + ⇣ v uit )/@ (.) )])Ki
The proof of Theorem 2 is similar.
                                                            60


                                     APPENDIX B
                 SIMULATION TABLES FOR CHAPTER 1
                         Table B.1: APE: vit ⇠ Normal(0, 1)
                                  Percentile                Percentile
                                    Mean                    Estimates
  z = 1, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75      25        50      75
t1                            0.0952   -0.0933  -0.0023 0.0685   -0.0815  -0.0020
                                                        (.0204)   (.0323) (.0011)
t2                            0.0352   0.1549   0.0023  0.0281    0.1316  0.0021
                                                        (.0113)   (.0522) (.0011)
x1                            0.1365   0.0695   0.0013  0.1207    0.0624  0.0012
                                                        (.0105)   (.0110) (.0003)
x2                            0.0820   0.1629   0.0027  0.0694    0.1638  0.0026
                                                        (.0214)   (.0320) (.0006)
  z = 0.5, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75      25        50      75
t1                            0.0952   -0.0933  -0.0023 0.0734   -0.0815  -0.0024
                                                        (.0149)   (.0238) (.0008)
t2                            0.0352   0.1549   0.0023  0.0306    0.1504  0.0024
                                                        (.0110)   (.0409) (.0009)
x1                            0.1365   0.0695   0.0013  0.1198    0.0599  0.0012
                                                        (.0104)   (.0102) (.0003)
x2                            0.0820   0.1629   0.0027  0.0698    0.1573  0.0026
                                                        (.0219)   (.0298) (.0006)
  z = 0.1, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75      25        50      75
t1                            0.0952   -0.0933  -0.0023 0.0708   -0.0934  -0.0024
                                                        (.0159)  (.0258)) (.0009)
t2                            0.0352   0.1549   0.0302  0.1489    0.1504  0.0024
                                                        (.0114)   (.0430) (.0009)
x1                            0.1365   0.0695   0.0013  0.1190    0.0601  0.0012
                                                        (.0105)   (.0108) (.0003)
x2                            0.0820   0.1629   0.0027  0.0698     0.148  0.0025
                                                        (.0220)   (.0317) (.0006)
Standard errors in parentheses
                                             61


                         Table B.2: APE: vit ⇠ Logistic(0, 1)
                                  Percentile                Percentile
                                    Mean                    Estimates
  z = 1, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75      25       50      75
t1                            0.0881   -0.0886  -0.0023 0.0299   -0.0721 -0.0031
                                                        (.0465)  (.0932) (.0104)
t2                            0.0385   0.1666   0.0024  0.0065    0.1278 0.0036
                                                        (.0727)  (.1819) (.0168)
x1                            0.1333   0.0684   0.0013  0.0985    0.0534 0.0014
                                                        (.0104)  (.0185) (.0026)
x2                            0.0915   0.1758   0.0027  0.0694    0.1752 0.0037
                                                        (.0252)  (.0404) (.0057)
  z = 0.5, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75      25       50      75
t1                            0.0881   -0.0886  -0.0023 0.0175   -0.0790 -0.0024
                                                        (.0262)  (.0437) (.0025)
t2                            0.0385   0.1666   0.0024  0.0252    0.1289 0.0025
                                                        (.0175)  (.0768) (.0027)
x1                            0.1333   0.0684   0.0013  0.0943    0.0479 0.0010
                                                        (.0069)  (.0096) (.0002)
x2                            0.0915   0.1758   0.0027  0.0666    0.1711 0.0027
                                                        (.0214)  (.0365) (.0007)
  z = 0.1, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75      25       50      75
t1                            0.0881   -0.0886  -0.0023 0.0062   -0.0688 -0.0019
                                                        (.0230)  (.0441) (.0015)
t2                            0.0385   0.1666   0.0024  0.0205    0.1091 0.0020
                                                        (.0151)  (.0759) (.0016)
x1                            0.1333   0.0684   0.0013  0.0921    0.0498 0.0010
                                                        (.0069)  (.0109) (.0003)
x2                            0.0915   0.1758   0.0027  0.0633    0.1802 0.0028
                                                        (.0188)  (.0397) (.0007)
Standard errors in parentheses
                                             62


                                                        2
                                Table B.3: APE: vit ⇠   1
                                  Percentile                 Percentile
                                    Mean                     Estimates
  z = 1, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75       25       50      75
t1                            0.1116   -0.1004  -0.0023  0.1836   -0.0265 -0.0006
                                                         (.0475)  (.0533) (.0021)
t2                            0.0504   0.1300   0.0023   0.0334    0.0547 0.0006
                                                         (.0473)  (.0768) (.0023)
x1                            0.1712   0.0701   0.0013   0.1625    0.0576 0.0010
                                                         (.0104)  (.0185) (.0026)
x2                            0.1200   0.1503   0.0027   0.1086    0.1508 0.0025
                                                         (.0308)  (.0329) (.0027)
  z = 0.5, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75       25       50      75
t1                            0.1116   -0.1004  -0.0023  0.2763   -0.0895 -0.0020
                                                         (.0268)  (.0246) (.0007)
t2                            0.0504   0.1300   0.0023   0.0724    0.1414 0.0021
                                                         (.0231)  (.0415) (.0008)
x1                            0.1712   0.0701   0.0013   0.1730    0.0489 0.0009
                                                         (.0068)  (.0089) (.0002)
x2                            0.1200   0.1503   0.0027   0.1311    0.1250 0.0021
                                                         (.0359)  (.0236) (.0005)
  z = 0.1, ⇣1 = 1, ⇣2 = 1
Covariate                       25        50       75       25       50      75
t1                            0.1116   -0.1004  -0.0023  0.3166   -0.0919 -0.0021
                                                         (.0290)  (.0254) (.0008)
t2                            0.0504   0.1300   0.0023   0.0779    0.1463 0.0021
                                                         (.0255)  (.0422) (.0008)
x1                            0.1712   0.0701   0.0013   0.1773    0.0497 0.0009
                                                         (.0062)  (.0096) (.0002)
x2                            0.1200   0.1503   0.0027   0.1382    0.1259 0.0021
                                                         (.0383)  (.0251) (.0005)
Standard errors in parentheses
                                             63


                         Table B.4: APE: vit ⇠ Normal(0, 1)
                                Percentile                Percentile
                                  Mean                    Estimates
  z = 1, ⇣1 = 1, ⇣2 = 1
Covariate                      25       50       75      25       50      75
t1                          0.0484   -0.0412  -0.0284 0.0077   -0.0466 -0.0285
                                                      (.0077)  (.0082) (.0058)
t2                          0.1474   0.1470   0.0477  0.1256    0.1422 0.0496
                                                      (.0202)  (.0232) (.0104)
x1                          0.1338   0.0950   0.0265  0.1173    0.0817 0.0226
                                                      (.0035)  (.0057) (.0033)
x2                          0.2112   0.2017   0.0604  0.1979    0.1897 0.0569
                                                      (.0101)  (.0099) (.0081)
Standard errors in parentheses
                                           64


                            APPENDIX C
      APPLICATION TABLES FOR CHAPTER 1
       Table C.1: Probit marriage coefficient estimates
                                  (all) (male only) (female only)
education                       0.0474     0.0561      0.0418
                                (.0127)   (.0176)      (.0196)
household income                0.0085     0.0074      0.0069
                                (.0012)   (.0016)      (.0021)
children at residence           0.5223     0.8566      0.3462
                                (.0368)   (.0703)      (.0470)
urban                          -0.2565    -0.1025      -0.3888
                                (.0800)   (.1163)      (.1138)
sexual intercourse              0.0010     0.0008      0.0011
                               (.00017)   (.0002)      (.0003)
ill w/ treatment                0.0008      0.008      -0.0110
                                (.0227)   (.0392)      (.0275)
ill w/o treatment               0.0106     0.0292      0.0152
                                (.0189)   (.0292)      (.0248)
chronic                         0.0936     0.1234      0.1008
                                (.0718)   (.1045)      (.0984)
work limited by illness        -0.3571    -0.3608      -0.2585
                                (.1568)   (.2407)      (.1914)
_const                          -1.487     -1.811       -1.118
                                (.2012)   (.2725)      (.3211)
N                                 1388       734          654
NT                                5552      2936         2616
Standard errors in parentheses
                                     65


       Table C.2: Linear marriage coefficient estimates
                                   (all) (male only) (female only)
education                        0.0155     0.0149      0.0152
                                 (.0041)   (.0050)      (.0070)
household income                 0.0029     0.0023      0.0026
                                 (.0004)   (.0005)      (.0008)
children at residence            0.1831     0.2591      0.1280
                                 (.0109)   (.0140)      (.0162)
urban                           -0.0857    -0.0362      -0.1396
                                 (.0270)   (.0350)      (.0411)
sexual intercourse               0.0003     0.0002      0.0004
                               (.000055)  (.00006)      (.0001)
ill w/ treatment                0.00002     0.0043      -0.0042
                                 (.0078)   (.0122)      (.0100)
ill w/o treatment                0.0031     0.0003      0.0055
                                 (.0063)   (.0086)      (.0090)
chronic                          0.0299     0.0280      0.0356
                                 (.0242)   (.0319)      (.0356)
work limited by illness         -0.1086    -0.0881      -0.0922
                                 (.0458)   (.0536)      (.0650)
_const                          -.0.0061   -0.0407      0.0884
                                 (.0638)   (.0747)      (.1146)
N                                 1391       736          655
NT                                5564       3944        2620
Standard errors in parentheses
                                      66


        Table C.3: APE Estimates accounting for correlated random e↵ects and a binary EEV
                         All data
                         Work                                                Sleep
percentile                   25       50         75             90                  25             50           75          90
marital status -0.0121 -0.0173                -0.0260       -0.0390              -0.1244        -0.1311      -0.1346    -0.1421
                         (.0579) (.0138)      (.0198)       (.0496)              (.0301)        (.0087)      (.0102)     (.0272)
                         Men only
                         Work                                                Sleep
                                                                                            7
marital status -0.2511 -0.2121                -0.1799       -0.1374            5.38 ⇥ 10        -0.0015      -0.0058    -0.0230
                         (.0645) (.0182)      (.0225)       (.0578)              (.0339)        (.0109)      (.0122)     (.0307)
                         Women only
                         Work                                                Sleep
marital status -0.3194 -0.0952                -0.0214       -0.0014              -0.0159        -0.0388      -0.0344    -0.0146
                         (.0422) (.0111)      (.0149)       (.0381)              (.0498)        (.0145)      (.0179)     (.0454)
Standard errors in parentheses
                           Table C.4: APE Estimates w/o integrating out Endogenous Error
                   All data
                   Work                                                 Sleep
percentile                25          50            75          90            25             50             75             90
                                4           4             4           4             4              4               4              4
marital status       2.99 ⇥ 10    1.95 ⇥ 10     1.27 ⇥ 10   1.03 ⇥ 10    7.79 ⇥ 10      8.92 ⇥ 10      2.51 ⇥ 10      3.62 ⇥ 10
                       (.0195)     (.0039)       (.0115)     (.0118)       (.0067)        (.0013)        (.0028)        (.0034)
                   Men only
                   Work                                                 Sleep
                                            4             4           4            17             35             52             77
marital status         -0.0011    7.33 ⇥ 10     5.00 ⇥ 10   4.01 ⇥ 10   2.55 ⇥ 10      1.31 ⇥ 10      1.65 ⇥ 10      1.93 ⇥ 10
                       (.0126)     (.0067)       (.0078)     (.0102)       (.0045)        (.0011)        (.0017)        (.0017)
                   Women only
                   Work                                                 Sleep
                                4           5             5           5                           4               5              6
marital status       2.35 ⇥ 10    8.81 ⇥ 10     4.54 ⇥ 10   4.29 ⇥ 10      0.0032      3.11 ⇥ 10      1.86 ⇥ 10      1.75 ⇥ 10
                       (.0047)     (.0023)       (.0022)     (.0044)       (.0322)        (.0095)        (.0134)        (.0134)
Standard errors in parentheses
                                                               67


                                                            APPENDIX D
   HIGH-LEVEL ASSUMPTIONS AND PROPOSITIONS FOR NUISANCE FUNCTION
                                                ESTIMATION IN CHAPTER 2
Assumptions P.1-P.3 are the parametric assumptions that are sufficient to imply Assumption NP.1.
Assumption P.1. (i)G(x; ) is a parametric model for p(x), where                                     2   ⇢ R M and G(x, ) > 0, all
x 2 X,     2 , where                is compact. (ii) There exists             0  2      such that p(x) = G(x,       0 ), 0  2 int( ).
(iii)G(X; ) is a.s. twice continuously di↵erentiable in a neighborhood of 0 , ⇤ ⇢ . (iv)ˆ is a con-
                                                              1/2 Pn
sistent estimator of 0 and n1/2 (ˆ                    0) = n          i=1 l (Wi ; 0 ) + o p (1), where Wi = (Y01 , Yi1 , Di , Xi ),
                                                          0
E[l (Wi ;   0 )] = 0, E[l (Wi ;           0 )l (Wi ;   0) ] exists and is positive definite and
                                                                   2
lim  !0  E[sup   2 ⇤ ,||     ⇤ ||   ||l (Wi ; )     l (Wi ;  0 )|| ]  =0. (vi) For some ✏ > 0, 0 < P(x; )  1                  ✏ a.s.
for all   2 int( ).
Assumption P.2. (i)g(x) = g(x; ) is a parametric model for the conditional mean of Y0t , where
    2 ⇥ ⇢ Rk , ⇥ being compact; (ii) g(X, ) is a.c. continuous at each                                   2 ⇥; (iii) there exists a
                                                ⇤
unique pseudo-true parameter                        2 int(⇥);(iv)g(X, ) is a.c. twice continuously di↵erentiable in
a neighborhood of            ⇤
                                , ⇥⇤ ⇢ ⇥; (iv) the estimator ˆ is strongly consistent for                       ⇤
                                                                                                                  and satisfies the
following linear expansion:
                                                                      X n
                                            p
                                               n( ˆ     ⇤
                                                          )=n    1/2
                                                                           l (Wi ;      ⇤
                                                                                          ) + o p (1)
                                                                       i=1
where l (·; ) is such that E[l (W; ⇤ )] = 0, E[l (W;                        ⇤
                                                                              )lg (W;      ⇤ 0
                                                                                            ) ] exists and is positive definite and
          h                                                      i
lima!0 E sup 2⇥⇤ ,|| ⇤ a ||l (W; ) l (W; ⇤ )|| = 0.
                                                                   h                                 i
Assumption P.3. E[||h(W; , )||2 ] < 1 and E sup                            2⇥ s , 2 s
                                                                                      |ḣ(W; , ) < 1 where ⇥ s ,         s is a small
                         ⇤       ⇤
neighborhood of            ,       , and
            h(W; , ) = (w0 (D, X; )) { Yti  }                          (w0 (D, X; )           w1 (D))P( Y0t  , X; )
     These are the standard assumptions found in the literature, such as in Sant’Anna and Zhao
(2020). Assumptions P.2 and P.2 imply that the parameters which index p(x; ) and P( Y0t 
                                                        p
  |x; ) are sufficiently smooth and are n-asymptotically linear. Assumption P.3 is an integrability
                                                                     68


condition. Assumptions P.1 and P.2 are stronger than Assumption NP, while Assumption P.3 is
necessary to apply the Weak Law of Large Numbers along with the Central Limit Theorem.
     I consider as a nonparametric estimator of the propensity score the sieve logit estimator of
Hirano, Imbens, and Ridder (2003), though the proof and assumptions that I am placing on that es-
timator are di↵erent, and in some sense relaxed, compared to the conditions in Hirano, Imbens, and
Ridder (2003) that are used to prove that the estimator converges uniformly to the true function at
       1/4
o p (n     ). When using this estimator, the goal is to approximate p(x),using a series approximation
such that
                                                    m(x)0 ⇡ r̃ K̃ (x)0   K̃
where
                                          r̃ K̃ (x)0 = (r1K̃ (x), ..., rK̃ K̃ (x))0
                                                  K = K̃ + 1
The estimator is given by
                                          X   n
                       ⇤                1
                     m = argmax⇡2Hn               [Di logL(m(xi )) + (1          Di )log(1   L(m(xi ))]
                                        n
                                           i=1
                     exp(a)
where L(a) =        1+exp(a)
                               and Hn denotes the sieve space. Let the sieve space be over the s-smooth
class of functions which I will denote by,
                             n                                                           D↵ m(x) D↵ h(y)    o
         H=     ⇤cp (X)               s
                          = m 2 C (X) : sup sup|D m(x)|  c, sup sup
                                                         ↵
                                                                                                         c
                                          [↵]s x2X                      [↵]=s x,y2X,x,y     |x y|e
where C s (X) denotes the space of all s-times continuously di↵erentiable functions on X, and | · |
                                                                   n                                      o
denotes the Euclidean norm. Furthermore, let Hn = m 2 Hn : m(x) = r̃ K̃ (x)0 K̃ , |m| s  c .
     I will let ||m m0 ||1 = sup|m(x) m0 (x)| and `(m, xi ) = Di logL(m(xi )) + (1 Di )log(1 L(m(xi )).
                                  x2X
Let H(w, Fn , || · ||r )) B log(N(w, F n , || · ||r )), where N(w, F n , || · ||r ) is the minimal number of w-balls
that cover Fn under || · ||r , and
Fn = {`(m, xi ) `(m, xi ) : ||m m0 ||  , m 2 Hn }. In addition,
          ⇢                      R p
  n = inf     2 (0, 1) : pn1 2 b 2 H(w, Fn , || · ||r ))dw  const. , where b > 0 is a constant.
                                                            69


    The assumptions below are sufficient for the consistency of the sieve logit estimator and to
satisfy Assumption NP.1. They are based upon conditions in Chen (2007):
Assumption NP.2. (i)E[D(logL(m0 (x))) + (1              D)log(1    L(m0 (x))] > 1, and
if E[D(logL(m0 (x))) + (1         D)log(1     L(m0 (x))] = 1
then E[D(logL(m0 (x))) + (1           D)log(1    L(m0 (x))] < 1 for all m 2 Hk \m0 for all k       1
(ii)There are functions d() and t(), where d() is a non-increasing positive function and t() is a
positive function such that for all ✏ > 0 and for all k        1,
E[D(logL(m0 (x))) + (1           D)log(1    L(m0 (x))]
- supm2Hn :||m  m0 ||1 ✏ E[D(logL(m(x)))    + (1   D)log(1    L(m(x))]
   d(k)t(✏) > 0
Assumption NP.3. Hk ⇢ Hk+1 ⇢ H for all k                  1, and there exists a sequence       k m0  2 Hk such
that || k m0     m0 ||1 !0 as k!1.
Assumption NP.4. (i)The sieve spaces Hk are compact under ||m1                  m2 ||1 , where m1 , m2 2 Hk
(ii)lim in fk(n) d(k(n)) > 0, E[`(h, xi ] is continuous at m = m0 2 ⇧, and E[supm2Hn |`(m, xi )] is
bounded. (iii)E[||xi ||] < 1
Assumption NP.5. log(N( , Hn , || · ||)) = o(n) for all > 0.
Assumption NP.6. There exist p and p such that 0 < p  p(x)  p < 1.
                           2(a)2
Assumption NP.7.          (2a+d)2
                                  > 14 , where d is the dimension of X, and a = (s + ↵), where m0 (x) is
a times continuously di↵erentiable and |m0 (x)           m0 (y)|  ||x  y||↵e , x, y 2 X under the Euclidean
norm || · ||e for 0 < ↵  1.
    Assumptions NP.2 consists of regularity conditions on the objective function. Assumption NP.3
implies that there exists some sequence of functions such that on subsets of the entire function
space there exists some sequence of function that uniformly converge to 0 as the subspaces grow
in size. Assumption NP.4 implies the existence of a solution at which the objective function is
                                                        70


minimized. Assumption NP.5 ensures that the function space does not grow too fast as the sample
size increases. Assumption NP.6 strengthens Assumption ID.5 so that the propensity score has
upper and lower bounds away from 0 and 1. This is necessary so that the log odds ratio is finite for
all x 2 X. Assumption NP.7 is a restriction on the di↵erentiability and smoothness of the propensity
score relative to the dimension of x. This is a weakening of the smoothness assumption in Hirano,
Imbens, and Ridder (2003).
    Using the previous assumptions, I obtain the following result:
                                                                                         1/4
Proposition 4. Under assumptions NP.2-NP.7, ||m̂                      m0 ||1 = o p (n        ).
    The next proof will prove the results for the nonparametric logit sieve estimator as in Hirano,
Imbens, and Ridder (2003), but starting from di↵erent assumptions. The proof itself is broken up
in two parts, first under a set of regularity conditions I prove that the estimator is consistent. Then,
I prove that it achieves the desired rate of convergence. First, I prove the following lemma.
                                                                                                                          b
Lemma 1. Suppose b,c are arbitrary constants such that b, c > 0 and b , c. Then sign(log( 1+b                               )
      c                   1              1
log( 1+c )) ,sign(log( 1+b   )    log( 1+c  ))
                                            1               1            1+c                                        1+c
Proof. Suppose b > c. Then log( 1+b            )     log( 1+c ) = log( 1+b   ). Since b > c > 0, then 0 <           1+b
                                                                                                                        < 1, so
     1+c                                          b             c                  b          c
log( 1+b ) < 0. Now, suppose that log( 1+b           )    log( 1+c ) < 0. Then    1+b
                                                                                        <   1+c
                                                                                                 , which implies that b < c.
                                        b                c
This is a contradiction, so log( 1+b       )     log( 1+c  ) > 0.
                                        1+c                     b             c
Now, suppose c > b. Then log( 1+b           ) > 0. If log( 1+b     )  log( 1+c  ) > 0, then b > c. This is a contradic-
                b             c
tion, so log( 1+b )     log( 1+c )<0                                                                                          ⇤
    Proof of Proposition 4:
Proof. Note that
                                                        exp(m(xi ))                                    1
             |`(m, xi )   `(m0 , xi )| = Di log(                        ) + (1 Di )log(                          )
                                                     1 + exp(m(xi ))                            1 + exp(m(xi ))
                                                       exp(m0 (xi ))                                   1
                                         Di log(                       )     (1    D i )log(                      )
                                                    1 + exp(m0 (xi ))                           1 + exp(m0 (xi ))
                                                         exp(m(xi ))                   exp(m0 (xi ))
                                       = Di [log(                        ) log(                          )]
                                                       1 + exp(m(xi ))             1 + exp(m0 (xi ))
                                                               71


                                                                             1                                 1
                                            + (1     Di )[log(                            )    log(                       )]
                                                                  1 + exp(m(xi ))                     1 + exp(m0 (xi ))
By the preceding lemma,
                       "                                                                  #
                                   exp(m(xi ))                    exp(m0 (xi ))
                  Di log(                          ) log(                                )
                               1 + exp(m(xi ))                  1 + exp(m0 (xi ))
                                   "                                                                 #
                                                   1                                   1
                 + (1 Di ) log(                                ) log(                               )
                                          1 + exp(m(xi ))                   1 + exp(m0 (xi ))
                                   exp(m(xi ))
                   [log(                          )
                               1 + exp(m(xi ))
                                exp(m0 (xi ))                               1                                1
                      log(                        )] [log(                               ) log(                         )]
                            1 + exp(m (xi ))0                    1 + exp(m(xi ))                    1 + exp(m0 (xi ))
                  = |m(xi )         m0 (xi )|
Then, supm,m0 2H:||m        m0 ||  |`(m, xi )   `(m0 , xi )|  . Hence, Condition (ii) of Theorem 3.5M in Chen
                                                                        p
(2007) is satisfied. Then by Condition 3.5M, mn ! m0 under || · ||1 .
Now, to prove the second part of the theorem consider the `2 metric defined by ||m m0 || p,2 =
p
  E [h(xi ) m0 (xi )]2 . This metric will be used to take advantage of inequalities in relation to || · ||1
to ultimately find the desired rate of uniform convergence. Suppose ||m                                    m0 || p,2  ✏ 2 . Note that by
                                                                       d`(m̃,xi )
the mean value theorem, `(m, xi )                  `(m0 , xi ) =          dh
                                                                                  [m     m0 ], where m̃ lies between m and m0 .
Condition 3.7 of Chen (2007) is satisfied by the preceding inequality in the first part of this proof.
Lemma 2 in Chen and Shen (1998) implies that ||m                                    m0 ||1  C1 ||m        m0 ||2a/(2a+d)
                                                                                                                  p,2      , where C1 is
a constant and C1 > 0. Then condition 3.8 of Chen (2007) is satisfied with sup||m                                       m0 || |`(m, xi )
                   2a/(2a+d)
`(m0 , xi )|                 C1 . Then by Theorem 3.2 in Chen (2007), ||m                               m0 || p,2 = O p (✏n ), with ✏ =
max{ n , |m0          n m0 |}.  Let um = supm2Hn ||h||1 , and ||m||1 = supxi 2X |m(xi )|. Then for all 0 <                        ✏
                                                                                                                                 C12
                                                                                                                                         <
                                                            4um
1, log( C✏2 , Hn , || · ||1 )  const · kn · log(1 +           ✏
                                                                  ) by Lemma 2.5 in van der Geer (2000), where kn "
          1
                   kn
as n!1, but        n
                      !0.    Then,
                                                     Z      p
                                               1         n
                                              p                 H⇤ (✏, Fn , || · ||)d✏
                                               n  2
                                                  n    b 2
                                                         n
                                                         Z n s                                 !
                                                   1                                      4u⇡
                                               p                   kn · log 1 +                 d✏
                                                   n  2
                                                      n    b 2n                             ✏
                                                                                            !
                                                   1     p Z n                       4u⇡
                                               p          kn           log 1 +               d✏
                                                   n  2
                                                      n           b 2n                  ✏
                                                                     72


                                                   1    p
                                            p            kn n
                                                   n 2
                                                     n
                                            const
                q
Then    n  ⇣       kn
                   n
                      ,  and ||    k m0     m0 ||1 = O(kn a/d ) by Lorentz (1966). Let             n ⇣ ||   k m0     m0 ||1 .
Then the optimal rate is with kn = o(n1/(2a+d) ). Then ||mn                 m0 || p,2 = O p (n a/(2a+d)
                                                                                                        ). Now, note that
||mn m0 ||2a/(2a+d)
            p,2         = o p (n  2a2 /(2a+d)2
                                                ). Since ||m m0 ||1  C1 ||m m0 ||2a/(2a+d)
                                                                                      p,2      and ||m m0 ||1 = o p (1),
                                 2a2 /(2a+d)2 )
then ||m     m0 ||1 = o p (n(                   ) = o p (n 1/4
                                                               ).                                                          ⇤
    Then I can also show, as in Hirano, Imbens, and Ridder (2003), that
  supx2X |ˆ⇡(x)     p(x)| = supx2X |L(m̂(x)              L(m0 (x)| . supx2X |m̂(x)     m0 (x)| = O(kn a/d ) = o p (n 1/4
                                                                                                                         )
    The next proof will tackle the case for nonparametric estimation of the conditional CDF. I need
to establish four lemmas before proving Proposition 5.
    The next theorem concerns the asymptotic behavior of the estimator of P̂( Y0t  y|X). This
estimator is a kernel density estimator, though a sieve estimator could also be chosen. The estimator
that I have chosen is based upon the estimator of Li and Racine (2008). The assumptions that are
needed include, (from Li and Racine (2008))
Assumption C.1. Both µ(x) and F(y|x) have continuous second-order partial derivatives with re-
spect to xc , where xc denotes the vector of continuous random variables. For fixed values of y and
x, µ(x) > 0, 0 < F(y|x) < 1
Assumption C.2. w(·) is a symmetric, bounded, and compactly supported density function, and
w(·) is a Lipschitz function on the compact set D.
Assumption C.3. As n!1, h s !1 for s = 1..., q,                      s !0 for s = 1, ..., r, and (nh1 ...hq )!1, and as
n!1, h0 !0 .
Assumption C.4. F(y|x) is twice continuously di↵erentiable in (y, xc ).
                                                                  73


                    P               P                                                                   Q
      Let |h| = qi=1 h s , | | = ri=1 s , where 0  s  1 and Wh (Xic , xic ) = qs=1 h s 1 w((Xisc xcs )/h s ). Let
                                                                 P       y Y
                   P                                         n 1 ni=1 G( h i )Wh (Xic ,xic )
µ̂(x) = n 1 ni=1 Wh (Xic , xic ) Let F̃(y|xc ) =                           0
                                                                        µ̂(x)
                                                                                             . G(·) is the distribution function with
corresponding density function w(·). h s is the bandwidth associated with the continuous variable
xcs , and h0 is the bandwidth associated with Yi 1.
We then have the following theorem:
Proposition 5. Suppose Assumptions C.1-C.4 hold and h0 = h. Then, sup x2D |F̃(y|xc )                                          F(y|xc )| =
       ln(n) 1/2
                           2
O p ( (nh q )1/2 ) + O p (h ).
                                                                                                                            1/4
      The restriction that h0 = h will be used to achieve the rate of convergence of o p (n                                     ). I need
to establish four lemmas before proving Proposition 5.
      The following lemma is similar to Lemma 1 in Li and Racine (2008).
Lemma 2. Under assumptions C.1-C.3, E[µ̂(x)] = µ(x) + O(|h2 |)
Proof.
                               Z                           !
                                               Xic     xic
                  E[µ̂(x)] =      µ(xic )W                   dxic (nh1 . . . hq )    1
                                                   h
                               Z                         !                                !
                                             xi1     x1                     xiq     xq
                            =     µ(xic )k                  ⇥ ··· ⇥ k
                                                 h1                             hq
                               Z
                            =     µ(xic + hv)k(v)dv
                               Z             X q                             q    q
                                                                       1 XX
                            =    [µ(xic )  +       µ s (xic )h s v s +                 µ(xc )h s h` v s v` ]k(v)dv + O(|h|3 )
                                              s=1
                                                                       2 s=1 `=1
                                           X q
                                         
                            = µ(xc ) +           µ ss (xc )h2s + O(|h|3 )
                                         2  s=1
                            = µ(xc ) + O(|h|2 )
                                                                                                                                        ⇤
                      R
      where  =          v2 k(v)dv.
      1
        In principle, the estimator here could be the estimator of Li and Racine (2008), where the covariates can be
discrete and ordered; however, in order to cite particular theorems from Rothe and Firpo (2013) I am only considering
an estimator that allows for continuous covariates, though as noted by Rothe and Firpo (2019), the results could be
modified to allow for discrete covariates.
                                                                      74


                                                                                                         Pq
Lemma 3. Under Assumptions C.1-C.4, E[µ̂(x)F̃(y|xc )] = µ(x)F(y|xc ) + µ(x)                                 i=1 h2s Bs (y, x) +
o(|h|2 ) + o(h20 )
Proof. See Theorem 6.2 (i) in Li and Racine (2007)                                                                            ⇤
    Now, the rate of uniform convergence proof largely follows Masry (1996). Furthermore, since
by Li and Racine (2008) it is shown that at the optimal (to minimize the integrated mean square
error) occurs when h1 , . . . , hq all converge to 0 at the same rate. I will denote this common h by
hmin .
                                                                                           1/2
Lemma 4. Under assumptions C.1-C.4, sup x2D ||µ(x)        ˆ        µ(x)| = O p ( (nh ln(n)                 2
                                                                                        q )1/2 ) + O p (|h| ).
Proof. Note that by the Triangle Inequality,
                            |µ̂(x)   µ(x)| = |µ̂(x)     µ(x)     E[µ̂(x)] + E[µ̂(x)]|
                                            |µ(x)      E[µ̂(x)]| + |E[µ̂(x)]          µ̂(x)|
By Lemma 2 and Lemma 3, |µ(x)               E[µ̂(x)]| = O(|h|2 ). Then it is sufficient to find the rate for
|E[µ̂(x)] µ̂(x)|. Since D is compact, it can be covered by a finite number L = L(n) of cubes Ik = In,k
with centers xk = xn,k having sides of length `n for k = 1, ..., L(n). Clearly `n = constant/L1/d (n).
Since D is compact, write
                   sup x2D |E[µ̂(x)]   µ̂(x)| = max1kLn sup x2D\Ik |E[µ̂(x)]               µ̂(x)|
                                               max1kLn sup x2D\Ik |µ̂(x)           µ̂(xk,n )|
                                              + max1kLn |E[µ̂(xk,n )]          µ̂(xk,n )|
                                              + max1kLn sup x2D\Ik |E[µ̂(xk,n )]              E[µ̂(x)]|
                                              B Q1 + Q2 + Q3
Since each kernel is Lispschitz, and the product of Lipschitz functions is a Lipschitz function,
                                      Q1 = |µ̂(x)     µ̂(xk,n )|
                                           |Wh (Xic , xc )    Wh (Xic , xk,n
                                                                          c
                                                                              )|
                                                         75


                                             (C2 /hq+1      min )sup x2D\Ik |x        xk,n |
                                             C2 `n /hq+1     min .
Let `n = (ln(n))1/2 h(q+2)/2 /n1/2 . Then Q1 = O((ln(n)/(nhq ))1/2 ). Similarly, Q3 = O((ln(n)/(nhq ))1/2 )
                                          P
Now let Wn (x) = µ̂(x) E[µ̂(x)] = i=1 Zn,i where,
                                                         ⇥                                           ⇤
                                Zn,i = (nhqmin )      1
                                                           Wh (Xic , xic )]        E[Wh (Xic , xic )]
For ⌘ > 0, we have
                     P[Q2 > ⌘]  P[max1kLn Wn (xk,n ) > ⌘]
                                P[Wn (x1,n ) > ⌘ or Wn (x2,n ) > ⌘, . . . , or Wn (xL(n),n ) > ⌘]
                                P[Wn (x1,n ) > ⌘ + Wn (x2,n ) > ⌘, . . . , +Wn (xL(n),n ) > ⌘]
                                sup x2S P[|Wn (x)| > ⌘]
Since µ̂(·) is bounded, and letting A = sup x2D |µ̂(x)|, we have |Zn,i |  2A/nhqmin for all i = 1, ..., n.
Define    n = (nhqmin ln(n))1/2 . Then       n |Zn,i |      2A(ln(n))/(nhqmin ]1/2  1/2 for all i = 1, ..., n for n
sufficiently large. Using the inequality e x  1+x+x2 for |x|  1/2, we have e                            n Zn,i
                                                                                                                  1+ n Zn,i + n2 Zn,i
                                                                                                                                   2
                                                                                                                                       .
                                                 2 2
Hence, E[e    n Zn,i
                     ]1+    2     2
                             n E[Zn,i ]  eE[    n Zn,i ]
                                                          . Then,
                                                             X n
                            P[|Wn (x)| > ⌘] = P[|                  Zn,i | > ⌘]
                                                             i=1
                                                            Xn                        Xn
                                                 = P[            Zn,i > ⌘] + P[ Zn,i < ⌘]
                                                            i=1                       i=1
                                                            Xn                           Xn
                                                  P[ Zn,i > ⌘] + P[                          Zn,i > ⌘]
                                                            i=1                          i=1
                                                                 Pn                      Pn
                                                                       Zn,i                   Zn,i
                                                  E[e         n  i=1       ] + E[e    n  i=1      ]
                                                                   2 Pn         2 )
                                                                            E(Zn,i
                                                  2e         n
                                                                e  n    i=1
                                                                     2       q
                                                  2e         n
                                                                eA   n /(nhmin )
                                                 A n2
                                          n ⌘+    q
Then sup x2D P[|Wn (x)| > ⌘]  2e               nh
                                                  min   . Let      n⌘     = C3 ln(n). Choose           n = [(nhqmin ln(n)]1/2 . Then
               2       q
   n ⌘/↵ + A n /(nhmin )   = C3 ln(n) + Aln(n) = ↵ln(n), where ↵ = C3 A. Since sup x2D P[|Wn (x)| >
                                                                  76


                      A n2
                n ⌘+    q
⌘]  2e              nh
                        min and P[Q2 > ⌘]  L(n)sup x2D P[|Wn (x)| > ⌘], then P[Q2 > ⌘n ]  2L(n)/n↵ . Choose
                                                                 P                                              P1
C3 sufficiently large and L(n) such that 1                          n=1 P[|Q2 /⌘n | > 1]  4 n=1 L(n)/n < 1. Then by the
                                                                                                                               ↵
Borel-Cantelli lemma, Q2 = O p ((ln(n))1/2 /(nhqmin )1/2 ).                                                                              ⇤
     Similarly, by Lemma 3, and by a similar result to Lemma 4, sup x2D |µ̂(x)F̃(y|xc ) µ(x)F(y|xc )| =
      ln(n) 1/2
                              2              2
O p ( (nhq )1/2 ) + O p (h0 ) + O p (|h| ). Then I have the following theorem,
Proof of Proposition 5
                                           µ̂(x)F̃(y|xc )     µ̂(x)F̃(y|xc )/µ(x)
Proof. Note that F̃(y|xc ) =                   µ̂(x)
                                                          =       µ̂(x)/µ(x)
                                                                                    .  By Lemma 4,
                                                                                         ln(n)1/2
                                          sup x2D |µ̂(x)          µ(x)| = O p (                         ) + O p (|h|2 )
                                                                                        (nhq )1/2
Then, by Lemma 4
                                                         µ̂(x)                                µ̂(x) µ(x)
                                           sup x2D |                 1| = sup x2D |                              |
                                                         µ(x)                                        µ(x)
                                                                                       ln(n)  1/2
                                                                                                                  2
                                                                              O p ( (nh    q )1/2 ) + O p (|h| )
                                                                         
                                                                                         in f x2D µ(x)
                                                                                      ln(n)1/2
                                                                         = O p ( q 1/2 ) + O p (|h|2 )
                                                                                      (nh )
Similarly, since
                                                                                              ln(n)1/2
                          sup x2D |µ̂(x)F̃(y|xc )         µ(x)F(y|xc )| = O p (                             ) + O p (h20 ) + O p (|h|2 )
                                                                                              (nh ) q   1/2
Then,
                                                                                          ln(n)   1/2
                                                                                                                    2           2
                                      |µ̂(x)F̃(y|xc )                             O p ( (nh    q )1/2 ) + O p (h0 ) + O p (|h| )
                                                                       c
                            sup x2D |                         F(y|x )| 
                                           µ(x)                                                        in f x2D µ(x)
                                                                                                    1/2
                                                                                          ln(n)
                                                                              Op(                       ) + O p (h20 ) + O p (|h|2 )
                                                                                          (nhq )1/2
Then,
                                                      µ̂(x)F̃(y|xc )/µ(x)
                                      F̃(y|xc ) =
                                                            µ̂(x)/µ(x)
                                                                               ln(n)  1/2
                                                      F(y|xc ) + O p ( (nh                                2
                                                                                   q )1/2 ) + O p (h0 ) + O p (|h| )
                                                                                                                         2
                                                  =                      ln(n)  1/2
                                                                                                      2
                                                           1 + O p ( (nh     q )1/2 ) + O p (h0 ) + O p (|h| )
                                                                                                                      2
                                                                              77


                   ln(n)  1/2
  F(y|xc ) + O p ( (nh                    2           2
                       q )1/2 ) + O p (h0 ) + O p (|h| )
=                     ln(n)   1/2
           1 + O p ( (nh   q )1/2 ) + O p (|h| )
                                              2
                   ln(n)1/2
= F(y|xc ) + O p (                ) + O p (h2 )
                   (nhq )1/2
                                                         ⇤
                   78


                                                       APPENDIX E
       PROOFS OF MAJOR THEOREMS AND PROPOSITIONS FOR CHAPTER 2
The first proof is of the identification result in Theorem 3.
Proof of Theorem 3
Proof. Note that by Theorem 1 in Callaway and Li (2019), the first portion of the result is proven.
All that remains is to show that
                              "                     !                                         !               #
                                  1   D ⇡(X)                            1    D ⇡(X)         D
      F   Y0t |D=1 (y) =E                               { Yt  y}                               P̃( Y0t  y|X)
                                     p 1 ⇡(X)                               p 1 ⇡(X)        p
   if ⇡(X) = p(X) a.c., or P̃( Y0t  y|X) = P( Y0t  y|X) a.c. Suppose ⇡(X) = p(X) a.c. Then,
    "                   !                                     !                   #
      (1 D)p(X)                         (1 D)p(X)           D
 E                             Yt y                    )        P̃( Y0t  y|x)
       p(1 p(x))                         p(1 p(X))          p
       "                                     #      "                           #     "                           #
         p(x)E[(1 D) Yt y|D=0,X                      p(X)P̃( Y0t |x, D = 0)            p(X)P̃( Y0t  y|X, D = 1)
  =E                                              E                                +E
                           p                                      p                                 p
       "                                       #     "                               #
         p(X)P( Y0t  y|X, D = 0)                      p(X)P̃( Y0t  y|X, D = 1)
  =E                                              E
                            p                                         p
       "                                     #
         p(X)P̃( Y0t  y|X, D = 1)
  +E
                           p
       "                                       #
         p(X)P( Y0t  y|X, D = 1)
  =E
                            p
       "                               #
         P( Y0t  y, D = 1|X)
  =E
                        p
  = P( Y0t  y|D = 1)
  =F    Y0t |D=1 (y)
Now, suppose ⇡(X) , p(X) a.c. and P̃( Y0t  y|X) = P( Y0t  y|X) a.c. Then,
                     "                 !                                  !               #
                       (1 D)⇡(X)                     (1 D)⇡(X)         D
                  E                         Yt y                  )        P( Y0t  y|X)
                       p(1 ⇡(X))                     p(1 ⇡(X))          p
                        "                                 #    "                                      #
                           p(X)P( Y0t  y|X, D = 1)              (1 p(X))⇡(X)P( Y0t  y|X, D = 0)
                   =E                                       E
                                          p                                    p(1 ⇡(X))
                        "                                            #
                          (1 p(X))⇡(X)P( Y0t  |X, D = 0)
                   +E
                                         p(1 ⇡(X))
                        "                           #
                          P( Y0t  y, D = 1|X)
                   =E
                                      p
                                                             79


                    = P( Y0t  y|D = 1)
                    =F     Y0t |D=1 (y)
                                                                                                                         ⇤
    The next proof holds in either the parametric or nonparametric subcase, though this proof is
for a parametric submodel, while the nonparametric submodel will proceed similarly. For the pur-
pose of estimation of F              Y0t |D=1 (y), only the period of treatment and the period prior needs to be
considered. If there are additional pre-treatment and post-treatment periods, they are not relevant
to the density of the data that is used to estimate F                 Y0t |D=1 (y). The proof itself is similar to a result
in Sant’Anna and Zhao (2020).
    Proof of Theorem 4:
Proof. The density of (yt (1), yt (0), yt 1 (0), d, x) with respect to some sigma-finite measure on L 2
R3 ⇥ {0, 1} ⇥ Rk is given by
     f¯(yt (1), yt (0), yt 1 (0), d, x)
      = f¯(yt (1), yt (0), yt 1 (0)|D = 1, x)d p(x)d f¯(yt (1), yt (0), yt 1 (0)|D = 0, x)1 d (1          p(x))1 d f (x)
The density of the observed data is,
           f (yt , yt 1 , d, x) = f1 (yt , yt 1 |D = 1, x)d p(x)d f0 (yt , yt 1 |D = 0, x)1 d (1    p(x))1 d f (x)
where
                                                           Z
                                      f1 (·, ·|D = 1, x) =    f¯(·, yt (0), ·|D = 1, x)dyt (0)
                                                           Z
                                      f0 (·, ·|D = 0, x) =    f¯(yt (1), ·, ·|D = 0, x)dyt (1)
Consider a parametric submodel indexed by a parameter ✓,
       f✓ (yt , yt 1 , d, x) = f1,✓ (yt , yt 1 |D = 1, x)d p✓ (x)d f0,✓ (yt , yt 1 |D = 0, x)1 d (1   p✓ (x))1 d f✓ (x)
                                                                80


which equals f (yt , yt 1 , d, x) when ✓ = ✓0 . The score is
                                                                                                            d p✓ (x)
   s✓ (yt , yt 1 , d, x) = ds1✓ (yt , yt 1 |D = 1, x) + (1               d)s0✓ (yt , yt 1 |D = 0, x) +                    ṗ✓ (x)
                                                                                                        p✓ (x)(1 p✓ (x))
                          + t✓ (x)
where, for d = 0, 1
                                          d                                                d                       d
      sd✓ (yt , yt 1 |D = d, x) =             log fd,✓ (yt , yt 1 |D = d, x), ṗ✓ (x) p✓ (x), and t✓ (x) = log f✓ (x)
                                         d✓                                               dx                      d✓
Then the tangent space is
              F = {ds1 (yt , yt 1 |D = 1, x) + (1               d)s0 (yt , yt 1 |D = 0, x) + a(x)(d        p(x)) + t(x)}
         !                                                                                                 R
where         sd (yt , yt 1 |D = d, x) fd (yt , yt 1 |D = d, x)dyy dyt               1 = 0 8x, d = 0, 1,     t(x) f (x)dx = 0 and
a(x) is any square integrable function of x. Under the assumption that                                    Y0t ?? D|X, note that
⌧=F      Yt |D=1 (  ),
                        ⌧ = E[E[          Y0t  |D = 1, X]|D = 1] = E[E[                 Y0t  |D = 0, X]|D = 1]
For the parametric submodel under consideration, I note that
                                        #
                                                yt  +yt 1 p✓ (x) f0,✓ (yt , yt 1 |D = 0, x) f✓ (x)dy3 dy2 dx
                             ⌧(✓) =                                 R
                                                                       p✓ (x) f✓ (x)dx
Then,
                    #
     @⌧(✓0 )                 yt  +yt 1 p(x)s0 (yt , yt 1 |D = 0, x) f0 (yt , yt 1 |D = 0, x) f (x)dy3 dy2 dx
                 =
        @✓                                                              p
                    R                                                              R
                        P( Y0t  |x, D = 0)p(x)t(x) f (x)dx                           P( Y0t  |x, D = 0)p(x) ṗ(x) f (x)dx
                +                                                              +
                                                 p                                                      p
                       R
                    ⌧[ ṗ(x) + p(x)t(x)] f (x)dx]
                                         p
Let the initial choice of an influence function be,
                                        (1 D)p(X))                        (1 D)p(X)
         F⌧ (Yt , Yt 1 , D, X) =                                Yt                        P( Yt  |X, D = 0)
                                         p(1 p(X))                        p(1 p(X))
                                        D p(X)                                          p(X)                            D
                                    +                 P( Yt  |X, D = 0) +                      P( Yt  |X, D = 0)        ⌧
                                               p                                           p                            p
                                                                        81


                                               (1 D)p(X)                                 (1 D)p(X)
                                            =                                 Yt                           P( Y0t  |X)
                                                p(1 p(X))                                      p(1 p(X))
                                               D                                     D
                                           + P( Y0t  |X)                               ⌧
                                               p                                      p
Note that for the parametric submodel with score s✓ (y1 , y0 , d, x), I can conclude that ⌧ is a di↵eren-
tiable parameter since
                                               @⌧(✓0 )
                                                               = E[F⌧ (Yt , Yt 1 , D, X)s✓ (Y1 , Y0 , D, X)]
                                                      @✓
Since F⌧ 2 F , then by Theorem 3.1 of Newey (1990), F⌧ (Yt , Yt 1 , D, X) is the efficient influence
function for F           Y0t |D=1 (     ).                                                                                                                          ⇤
    Proof of Theorem 5:
Proof. Consistency of estimator, nonparametric case:
   F̂ Y0t |D=1 (y)
                  20                                          1                          0                                              1                       3
             Xn 66B                                           C                          BBB                                            CCC                     777
                  666BBB 1 Di                 ⇡ˆ (xi ) CCCC                                BBB 1 Di ⇡ˆ (xi )                   Di CCC                             7
    =n     1        6BB
                    666BB Pn                                  CC { Yt  }                    BBB PN                         Pn            CCC P̂( Y0t  y|x)77777
                      4@ k=1 Dk 1 ⇡ˆ (xi ) CA                                                  @ k=1 Dk 1 ⇡ˆ (xi )            k=1 Dk A                            5
             i=1               n                                                                    n                           n
                                                                                  X  n
                                         p                                              Dk        p
    Suppose that ⇡ˆ (x) ! ⇡(x) Furthermore,                                                    n
                                                                                                  ! p. Then by the WLLN and the Continuous
                                                                                   k=1
Mapping Theorem,
                            n B
                                0                                         1
                          X     BBB 1 Di                                  C                                                                       !
                        1         BB                       ⇡ˆ (xi ) CCCC                            p      1 D ⇡(x)
                n                  BB@ Pn                                 CC { Yti  y} ! E                                       { Yt  y}
                                          k=1 Dk 1              ⇡ˆ (xi ) CA                                  p 1 ⇡(x)
                          i=1               n
Then,                                            2 0                                                         1                      3
                                          X n 66 BB                                                          CCC                    777
                                                 666 BBB 1 Di                  ⇡ˆ (xi )                 Di CCC                        7
                                     n  1          6 B
                                                   666 BBB PN                                       Pn         CCC P̂( Y0t  |x)77777
                                                     4 @ k=1 Dk 1                  ⇡ˆ (xi )           k=1 Dk A                        5
                                           i=1                      n                                    n
converges in probability to
                                                      "                                               !                   #
                                                             1        Di      ⇡(xi )              Di
                                                E                                                       P̃( Y0t  |x)
                                                                    p        1 ⇡(xi )               p
This implies that F̂ Y0t |D=1 (y) converges in probability to
                 "                                                         #       "                                       !                        #
                      1 D ⇡(x)                                                                  1 Di ⇡(xi )             Di
              E                                        { Yt  } + E                                                          P̃( Y0t  |x) .
                          p 1 ⇡(x)                                                                p 1 ⇡(xi )            p
                                                                                        82


    If ⇡(X) = p(X) a.c. or P̃( Y0t  y|X) = P( Y0t  y|X) a.c., then by the previous theorem
       "                                                       #      "                                       !                           #
         1 D ⇡(X)                                                         1 Di ⇡(X)                       Di
    E                                         { Yt  y} + E                                                     P( Y0t  y|X) = F Y0t |D=1 (y)
             p 1 ⇡(X)                                                       p 1 ⇡(X)                      p
  F̂  Y0t |D=1 (y)              F    Y0t |D=1 (y)
                  20                                        1                 0                                                 1                              3
          X n 66B                                           C                 BBB                                               CCC h                          777
       1          666BBB 1 Di                  ⇡ˆ (xi ) CCCC                    BBB 1 Di ⇡ˆ (xi )                         Di CCC                          i      7
   =                6BB
                    666BB Pn                                CC { Yt  y}          BBB PN                              Pn          CCC P̂( Y0t  y|xi ) 77777
       n              4@ k=1 Dk 1 ⇡ˆ (xi ) CA                                       @ k=1 Dk 1 ⇡ˆ (xi )                  k=1 Dk A                                5
           i=1                     n                                                           n                           n
      F    Y0t |D=1 (y)
The next proof follows partly from the proof of Theorem 2(b) in Rothe and Firpo (2019). In partic-
ular, the object is to expand the doubly robust moment condition and demonstrate that each term
converges in probability to 0 at the desired rate. In the parametric case, the proof is very similar to
Sant’Anna and Zhao (2020). The proof in nonparametric case is also similar to Fan et al. (2016)
when the nuisance function is estimated using a sieve approach. Now, I will expand
  F̂  Y0t |D=1 (y)              F Y0t |D=1 (y)
                      20                                     1                      0                                               1                            3
          X n 66B                                            C                      BBB                                             CCC h                        777
       1              666BBB 1 Di ⇡ˆ (xi ) CCCC
                        6    B                                                        BBB 1 Di ⇡ˆ (xi )                   Di CCC                          i        7
   =                    666BBB Pn                            CC { Yt  y}               BBB PN                        Pn              CCC P̂( Y0t  y|xi ) 77777
       n                  4@ k=1 Dk 1 ⇡ˆ (xi ) CA                                         @ k=1 Dk 1 ⇡ˆ (xi )            k=1 Dk A                                  5
           i=1                     n                                                           n                           n
      F    Y0t |D=1 (y)
Let
                                                                 !                                                            !
                                 1      Di         ⇡ˆ (xi )                                   1    Di      1              Di ⇥                       ⇤
                           i  =                                     { Yt  y}                                                   P( Y0t  y|xi )
                                      p       (1 ⇡ˆ (xi ))                                       p    (1  ⇡ˆ (xi ))2       p̂
                                Di
                                    F    Y0t |D=1 (y)
                                p̂
                                                                        !
                          2          1     Di        ⇡(xi )         Di
                          i   =                                   +
                                         p      1 ⇡(xi )             p
                22
                i             =0
                                                                  !                                                    !
                          1      1      Di              1                                      1    Di       1           ⇥                      ⇤
                          i   =                                      { Yt  y}                                             P( Y0t  y|xi )
                                      p       (1       ⇡ˆ (xi ))2                                 p    (1   ⇡ˆ (xi ))2
                                                                                           83


                                                       !                                                    !
                 11      1     Di      2ˆ⇡(xi )                                1 Di 2ˆ⇡(xi )                  ⇥                     ⇤
                 i   =                                       { Yt  y}                                          P( Y0t  y|xi )
                             p (1 ⇡ˆ (xi ))          3                              p (1 ⇡ˆ (xi ))        3
                                                       !
                 12      1 Di               1
                 i =
                             p (1 ⇡ˆ (xi ))2
                                                       !                                                    !
                 13      Di 1               1                                  Di 1              1            ⇥                     ⇤
                 i =                                         {   Yt   y}                                       P(    Y 0t  y|x i )
                            p̂2 (1 ⇡(xi ))2                                        p̂2 (1 ⇡(xi ))2
                                                                !
                 23         Di 1 ⇡(xi )                    Di
                 i =
                               p̂ 1 ⇡(xi ) p̂2
                                 2
                       "                            !                                                         !                        #
                   3      Di 1 ⇡(xi )                                        Di 1 ⇡(xi )                  Di ⇥                        ⇤
                   i =                                   { Yt  y}                                    +          P( Y0t  y|xi )
                              p̂2 1 ⇡(xi )                                      p̂2 1 ⇡(xi ) p̂2
                       Di
                     + 2 F Y0t |D=1 (y)
                       p̂
                       "                                 !                                                               !                    #
                 33       2(1 Di ) ⇡(xi )                                        2(1 Di ) ⇡(xi )                   2Di ⇥                     ⇤
                 i =                                           { Yt  y}                                                    P( Y0t  y|xi )
                                p̂3       1 ⇡(xi )                                     p̂3     1 ⇡(xi )             p̂3
                         Di
                       2 3 F Y0t |D=1 (y)
                         p̂
                            n
                       1X
    n ( p̂, ˆ
            ⇡ , P̂)  =             (Di , ⇡ˆ (xi ), P̂( Y0t  y|xi ), p̂)
                       n i=1
Then,
                                                   n                                      n
                                              1X           1                         1X        2
     n ( p̂, ⇡
             ˆ , P̂)      n (p, ⇡, P) =                    i (ˆ⇡(xi )   ⇡(xi )) +              i ( P̂(  Y0t  y|xi )        P( Y0t  y|xi ))
                                              n i=1                                  n i=1
                                                  n                            n
                                             1X           3               1X          11
                                         +                i ( p̂    p) +              i (ˆ
                                                                                         ⇡(xi )      ⇡(xi )2
                                             n   i=1
                                                                          n i=1
                                                  n
                                             1X           12
                                         +                i ( P̂(    Y0t  y|xi )       P( Y0t  y|xi ))(ˆ⇡(xi )           ⇡(xi ))
                                             n   i=1
                                                  n
                                             1X           13
                                         +                i (ˆ ⇡(xi )    ⇡(xi ))( p̂     p)
                                             n   i=1
                                                X n
                                             1            23
                                         +                i ( P̂(    Y0t  y|xi )       P( Y0t  y|xi ))( p̂          p)
                                             n   i=1
                                                  n                                                                      n
                                             1X           22                                                  2     1X       33
                                         +                i ( P̂(    Y0t  y|xi )       P( Y0t  y|xi )) +                   i ( p̂      p)2
                                             n   i=1
                                                                                                                    n i=1
                                         + O p (||ˆ⇡         ⇡||31 ) + O p (||P̂     P||31 ) + O p (|| p̂     p||31 )
It remains to show that each term is o p (n1/2 ). Each term outside of the first term converges con-
verges either due to Rothe and Firpo (2019) or due to the fact that |ˆ⇡(xi )                                          ⇡(xi )| = o p (n1/4 ). For
the first term, let Gn ( f0 ) = n1/2 (Pn                    P) f0 (D, Yt , x), where Pn is the empirical measure, P is the
                                                                          84


expectation, and
                                                   (1    D)(    Y ty    P( Y0t  y|x))
                            f0 (D, Yt , x) =                                                 (ˆ⇡(xi )     ⇡(xi ))
                                                                p(1    ⇡(x))2
Since sup x2X |ˆ⇡(x)             ⇡(x)| . O(kn a/d ) = o p (1) by Theorem 3.2 in Chen (2007) and the proof of
Proposition 4, then define F = { f0 : ||ˆ⇡(x)                   ⇡(x)||1     n }, where     n   = C(kn a/d ) for some C > 0. By
Lemma 3 in Rothe and Firpo (2013), P f0 (D. Yt , x) = 0. By the Markov inequality and Corollary
19.35 of Vaart (2000)
                             Pn       1
                                i=1   i (ˆ
                                         ⇡(x)      ⇡(x))
                                                           sup f0 2F Gn ( f0 ) . J[] (||F0 || p,2 , F , L2 (p))
                                         n1/2
where J[] (||F0 ||p, 2, F , L2 (p)) is the bracketing integral, and F0 is the envelope function. Let F0 =
{ f0 : ||ˆ⇡(x)     ⇡(x)||1  C} Since p and ⇡(x) is bounded away from 0, then
                                     | f0 (D, Yt , x)| .     n|   Y ty    P( Y0t  y|x)| B F0
Then since          Y ty        P( Y0t  y|x)) is bounded by 1, ||F0 || p,2             n.  Then,
                                                     logN[] (✏, F , L2 (p))
                                                      . logN[] (✏, F0 n , L2 (p)
                                                      = logN[] (✏/ n , F0 , L2 (p)
                                                      . logN[] (✏/ n , ⇤cp (X), L2 (p)
                                                      . ( n /✏)d/p
where the last inequality follows by Corollary 2.7.2 in Vaart and Wellner (1996). Then,
                                                 Z      q                                Z                          n!1
                                                      n                                       n
                                                                                                         d/a      n   !0
            J[] (||F0 || p,2 , F , L2 (p)) .             logN[] (✏, F , L2 (p))d✏ .             ( n /✏)      d✏          !0
                                                   0                                      0
                                                                                                                    Pn   1 (ˆ
                                                                                                                         i ⇡(x) ⇡(x))
where the integral converges to zero since d/a > 2 by Assumption NP.7. Then                                          i=1
                                                                                                                          n1/2
                                                                                                                                      =
o p (1).
Consistency of estimator, parametric case:
                                    F̂  Y0t |D=1 (y)  =
                                                                    85


                                             n B
                                                     0                                          1
                                           X         BBB 1 Di                   ⇡(xi ; ˆ ) CCCC
                                                                                                C
                                    n    1             BB
                                                        BB@ n   P                               CC { Yt  y}
                                                                  k=1 Dk 1          ⇡(xi ; ˆ ) CA
                                           i=1                       n
                                             n B
                                                          0                                                    1
                                           X              BBB 1 Di ⇡(xi ; ˆ )                           Di CCCC
                                                                                                               C
                                    n    1                  B
                                                            BBB Pn                                  Pn         CCC P̂( Y0t  y|Xi ; ˆ )
                                                              @ k=1 Dk 1 ⇡(xi ; ˆ)                     k=1 Dk A
                                           i=1                       n                                   n
                                                                         Xn
                          p                                                    Dk    p
                                     ⇤
Suppose that ˆ !                        Furthermore,                           n
                                                                                    ! p. Then by the WLLN and the Continuous Mapping
                                                                         k=1
Theorem,
                n B
                     0                                                       1
              X      BBB 1 Di                                                C                                                                         !
            1          BB                                      ⇡(xi ; ˆ ) CCCC                      p      1 D ⇡(xi ; ⇤ )
         n              BB@ Pn                                               CC { Y0t  y} ! E                                               { Y0t  y}
                                   k=1 Dk 1                      ⇡(xi ; ˆ ) CA                                p 1 ⇡(xi ; ⇤ )
              i=1                      n
                     p
Assume that ˆ !                    ⇤
                                     . Then,
                                             2 0
                                         n 6 B
                                                                                                                 1                            3
                                      X      666 BBB 1 Di ⇡(xi ; ˆ )                                    Di CC
                                                                                                                 CCC                          777
                                               666 BBB P                                                                                    ˆ   7
                            n 1                  64 B@ n Dk 1 ⇡(xi ; ˆ )                            Pn             CCA P̂( Y0t  y|Xi ; )77775
                                                                                                                     C
                                                                   k=1                                 k=1 Dk
                                       i=1                           n                                   n
converges in probability to
                                                    "                                                    !                          #
                                                                   1     Di      ⇡(xi ; ⇤ )           Di
                                                  E                                                        P̃( Y0t  y|xi )
                                                                       p      1 ⇡(xi ; ⇤ )            p
This implies that F̂                Y0t |D=1 (y)                converges in probability to
            "                                                                   #        "                                               !             #
              1      D ⇡(x; ⇤ )                                                                   1   Di     ⇡(xi ; ⇤ )              Di
         E                                                        { Yt  y} + E                                                            P( Y0t  y)
                   p 1 ⇡(x; )                                                                       p     1 ⇡(xi ; ⇤ )                p
                             ⇤                                                                                         ⇤
If ⇡(X; ) = p(X;                   ) a.c. or P̃( Y0t  y|X; ) = P( Y0t  y|X;                                            ) a.c., then by the previous theorem
          "                                                                     #      "                                                !                #
            1      D ⇡(X; ⇤ )                                                                    1    Di    ⇡(X; ⇤ )                Di
       E                                                         { Yt  y} + E                                                            P( Y0t  y|X)
               p 1 ⇡(X; ⇤ )                                                                         p    1 ⇡(X; ⇤ )                  p
        =F     Y0t |D=1 (y)
    Note that,
         F̂  Y0t |D=1 (y)                F Y0t |D=1 (y)
                          n B
                              0                                                   1
                   X          BBB 1 Di ⇡(xi ; ˆ ) CCCC                                                                                                !
                                BBB P                                             C                          1           D ⇡(x; )
          =n 1                    B@ n Dk 1 ⇡(xi ; ˆ ) CCCA { Yti  y}                                   E                                   { Yt  y}
                                          k=1                                                                         p 1 ⇡(x; )
                    i=1                     n
                                                                                           86


                       n 6B
                           20                                                       12                                                     33
                    X      666BBB 1 Di                ⇡(xi ; ˆ )             Di
                                                                                    CCC 6 (1          D j )( { µ0t (xi ; ˆ ) + û0t j }) 77777777
              n   1          66BB
                              664BB@ Pn                                 Pn            CCCC 6664                                            75777
                                                                                         C
                                        k=1 Dk 1         ⇡ˆ (xi ; ˆ )           D
                                                                           k=1 k A                              n1 D                            5
                    i, j=1                n                                  n
                  "                                               !                              #
                      1         Di       p̃(xi ; )           Di
              E                                                     P̃( Y0t  y|x; )
                          p            1 ⇡(xi ; )             p
           = (CDF   ˆ 1             CDF 1 )         (CDF ˆ 2        CDF 2 )
                                                     !
                           P1n D            ⇡(x;ˆ )
Let w0 (D, x; ˆ ) =                                    .
                                   k=1 Dk 1 ⇡(x;ˆ )
                                     n
Then,
    p
      n(CDF  ˆ 1         CDF 1 )
                X n
           1/2                                                                                ⇤
     =n               (w0 (Di , xi ; ˆ ) { Yti  y}                   E[w0 (D, x;               ) { Yt  y}])
                 i=1
                X n
           1/2                                                                                ⇤
     =n               (w̃0 (Di , xi ; ˆ ) { Yti  y}                  E[w0 (D, x;               ) { Yt  y}])
                 i=1
               n                                          !         "                                 # E h(1 D) ⇡(xi ; ⇤ ) { Y  y}i
        p X                                ⇡(xi ; ˆ )                            ⇡(x; ⇤ )                              1 ⇡(x; ⇤ )        t
          n           (1           Di )                         E (1      D)                           ·          h                      i
                                        1 ⇡(x; ˆ )                            1 ⇡(x; ⇤ )                       E (1 D) ⇡(x; )
                                                                                                                                    ⇤       2
             i=1                                                                                                              1 ⇡(x; ⇤ )
     + o p (1)
                X n
           1/2                                                                                ⇤
     =n               (w̃0 (Di , xi ; ˆ ) { Yti  y}                  E[w0 (D, x;               ) { Yt  y}])
                 i=1
               X  n
           1/2                                                            ⇤
       n              ((w̃0 (Di , xi ; ˆ )           1)E[w0 (D, x;          ) { Yt  y}]) + o p (1)
                i=1
                X n
           1/2                                                                                   ⇤
     =n               ((w̃0 (Di , xi ; ˆ )( { Yti  y}                  E[w0 (D, x;                ) { Yti  y}]) + o p (1)
                 i=1
where
                                                                                  , "                               #
                                                               ⇡(x; ˆ )(1 D)                     ⇡(X; ⇤ )(1     D)
                                        w̃0 (D, x; ˆ ) =                                 E
                                                                   1 ⇡(x; ˆ )                       1 ⇡(x;     ⇤)
                                                                                   ⇤
Then, I do a second-order Taylor expansion around                                        , so that
      p
        n(CDF  ˆ 1        CDF 1 )
                  X n
             1/2
      =n                 w0 (Di , xi ; ⇤)( { Yti  y}                    E[w0 (D, x; ⇤) { Yt  y}])
                   i=1
                                                                            87


                                        X n
                                   1/2
     + (ˆ          ⇤)0 · n                    ẇ0 (Di , xi ; ⇤)( { Yti  y}     E[w0 (D, x; ⇤) { Yt  y}] + o p (1)
                                         i=1
                 X n
           1/2
     =n                w0 (Di , xi ; ⇤)( { Yti  y}                 E[w0 (D, x; ⇤) { Yt  y}])
                  i=1
                                           X n
        p                                1
     +    n(ˆ               ⇤)0 · n             ẇ0 (Di , xi ; ⇤)( { Yti  y}     E[w0 (D, x; ⇤) { Yt  y}] + o p (1)
                                            i=1
                 X n
           1/2
     =n                w0 (Di , xi ; ⇤)( { Yti  y}                 E[w0 (D, x; ⇤) { Yt  y}])
                  i=1
                X  n
           1/2
     +n               l ⇤ (Wi )0 · E[ẇ0 (Di , xi ; ⇤)( { Yti  y}               E[w0 (D, x; ⇤) { Yt  y}])] + o p (1)
                 i=1
                 X n
           1/2
     =n                w0 (Di , xi ; ⇤)( { Yti  y}                 E[w0 (D, x; ⇤) { Yt  y}])
                  i=1
                X  n
           1/2
     +n               l ⇤ (Wi )0 · E[↵(Di , xi ; ⇤)( { Yti  y}                 E[w0 (D, x; ⇤) { Yt  y}])˙⇡(X; ⇤)]
                 i=1
     + o p (1)
                 X n
           1/2
     =n               (w0 (Di , xi ; ⇤)( { Yti  y}                 E[w0 (D, x; ⇤) { Yt  y}])
                  i=1
     + l ⇤ (Wi )0 · E[↵(Di , xi ; ⇤)( { Yti  y}                     E[w0 (D, x; ⇤) { Yt  y}])˙⇡(x; ⇤)]) + o p (1)
where
                                          ẇ(D, x; ) = ↵(D, x; )˙⇡(x; )
                                                                            , "                       #
                                                                  1 D           ⇡(x; ⇤ )(1      D)
                                           ↵(D, x; ) =                       E
                                                             (1 ⇡(x; ))2           1 ⇡(x;      ⇤)
Observe that,
        ˆ 2
      CDF           CDF 2
                  n 6B
                       20                                                                                 333
               X       666BBB 1 Di ⇡(xi ; ˆ ) 266 (1                 D j )( { µ0t (xi ; ˆ ) + û0t j }) 7777777777777
      =n     1           6666BBBB Pn                           66                                         75777777
                            4@ k=1 Dk 1 ⇡ˆ (xi ; ˆ ) 4                         n1 D                            55
               i, j=1                  n
            "                                      !                      #
                1      Di            p̃(xi ; )                        ⇤
         E                                           P̃( Y0t  y|x;      )
                     p            1 ⇡(xi ; )
                                                                      88


                 n 6B
                       20                        2                                                  333
              X                                                                                                   "   !                             #
            1
                       66BB
                        6666BBBB P Di            666 (1      D j )( { µ0t (xi ; ˆ ) + û0t j }) 7777777777777       D                            ⇤
          n                64B@ n Dk               64                                               75777777  E         P̃( Y0t  y|x;             )
                                     k=1                                n1 D                            55          p
              i, j=1                   n
       = (CDF ˆ 21                CDF 21 )            (CDF ˆ 22      CDF 22 )
  Similarly note that,
                    p
                            n(CDF    ˆ 22            CDF 22 )
                                       X n                  ⇣                                                                        ⌘
                                   1/2                                               ⇤                                          ⇤
                    =n                        w1 (Di ) P̃( Y0ti  y|xi ;               )   E[w1 (D)P̃( Y0t  y|x;                 )]
                                        i=1
                                                              X n
                               p
                    + n( ˆ                    ⇤ 0
                                                ) ·n       1
                                                                   (w1 (Di )P̃˙ ( Y0t  y|xi ;        ⇤
                                                                                                         )) + o p (1)
                                                               i=1
                                       X n                  ⇣                                                                        ⌘
                                   1/2                                               ⇤                                          ⇤
                    =n                        w1 (Di ) P̃( Y0ti  y|xi ;               )   E[w1 (D)P̃( Y0t  y|x;                 )]
                                        i=1
                                       X n
                    +n            1/2
                                             l(Wi ;         ) E[w1 (Di )P̃˙ ( Y0t  y|x;
                                                          ⇤ 0                                     ⇤
                                                                                                     )] + o p (1)
                                       i=1
                                       X n                   ⇣                                                                        ⌘
                                   1/2                                                ⇤                                          ⇤
                    =n                       (w1 (Di ) P̃( Y0ti  y|xi ;                )   E[w1 (D)P̃( Y0t  y|x;                 )]
                                        i=1
                    + l ⇤ (Wi )0 E[w1 (Di )P̃˙ ( Y0t  y|x;                       ⇤
                                                                                    )]) + o p (1)
                               D
where w1 (D) =        Pn
                                  D   Furthermore, note that
                            k=1 k
                                n
  p
    n(CDF ˆ 21        CDF 21 )
            Xn
  =n    1/2
                   w̃1 (Di , xi ; ˆ )(P̃( Y0ti  y; |xi ˆ )                    E[w0 (D, x       ⇤
                                                                                                  )P̃( Y0ti  y|x;         ⇤
                                                                                                                             )] + o p (1)
            i=1
            Xn
        1/2                                ⇤                              ⇤                                                   ⇤
  =n               w1 (Di , Xi ;              )(P̃( Y0ti  y|xi ;           )     E[w0 (D, x; )P̃( Y0ti  y|x;                  )]
            i=1
     p                              h                     ⇣                                                                                 ⌘            i
                   ⇤ 0                                ⇤                          ⇤                            ⇤                         ⇤             ⇤
  +    n(ˆ           ) · E ↵(D, x;                      ) P̃( Y0t  y|x;           )      E[w0 (D, x;           )P̃( Y0ti  y|x;          )] ⇡˙ (x;     )
     p
  +    n( ˆ        ⇤ 0
                    ) · E[w0 (D, x;                    ⇤
                                                         )P̃˙ ( Y0ti  y|x;      ⇤
                                                                                   )] + o p (1)
            Xn
        1/2                                 ⇤                             ⇤                             ⇤                       ⇤
  =n              (w0 (Di , xi ;              )(P̃( Y0ti  y|xi ;           )     E[w0 (D, x;             )P( Y0ti  y|x;         )])
            i=1
                      h                          ⇣                                                                                  ⌘              i
  + l ⇤ (Wi )0 · E ↵(D, x;                  ⇤
                                              ) P̃( Y0t  y|x;           ⇤
                                                                           )     E[w0 (D, x;          ⇤
                                                                                                         )P̃( Y0ti  y|x;      ⇤
                                                                                                                                 )] ⇡˙ (x;     ⇤
                                                                                                                                                 )
  + l ⇤ (Wi )0 · E[w0 (D, x;                  ⇤
                                                )P̃˙ ( Y0ti  y|x;       ⇤
                                                                           )]) + o p (1)
                                                                                89


Then by combining all the asymptotic expansions, I obtain
     p
       n(F̂  Y0t |D=1 (y)     F   Y0t |D=1 (y))
               X  n
           1/2                         ⇤                                             ⇤
     =n              (w0 (Di , xi ;      )( { Yti  y}             E[w0 (D, x;         ) { Yt  y}])
               i=1
     + l ⇤ (Wi )0 · E[↵(Di , xi ; ⇤ )( { Yti  y} E[w0 (D, x; ⇤ ) { Yt  y}])˙⇡(x;                                   ⇤
                                                                                                                       )]
                    ⇣                                                                         ⌘
     + [w1 (Di ) P̃( Y0ti  y|xi ; ⇤ ) E[w1 (D)P̃( Y0t  y|x; ⇤ )]
     + l ⇤ (Wi )0 E[w1 (Di )P̃˙ ( Y0t  y|x;            ⇤
                                                          )]]
                          ⇤
        [(w0 (Di , xi ;     )(P̃( Y0ti  y|xi ; ⇤ ) E[w0 (D, x; ⇤ )P̃( Y0ti  y|x; ⇤ )])
                        h                  ⇣                                                                                  ⌘          i
     + l ⇤ (Wi )0 · E ↵(D, x; ⇤ ) P̃( Y0t  y|x; ⇤ ) E[w0 (D, x; ⇤ )P̃( Y0ti  y|x;                                       ⇤
                                                                                                                            )] ⇡˙ (x; ⇤
                                                                                                                                        )
     + l (Wi )0 · E[w0 (D, x;          ⇤
                                         )P̃˙ ( Y0ti  y|x;         ⇤
                                                                      )])] + o p (1)
After simplification, I obtain,
                          p
                             n(F̂   Y0t |D=1 (y)      F    Y0t |D=1 (y))
                                       X n
                                 1/2                            ⇤                                                ⇤
                           =n                (w0 (Di , xi ;       )( { Yti  y}           P̃( Y0ti  y|xi ;        )
                                        i=1
                          + E[w0 ( )P̃( Y0ti  y|x; ⇤ )] E[w0 { Yt  y}])
                                           ⇣                                                                   ⌘
                          + [w1 (Di ) P̃( Y0ti  y|xi ; ⇤ ) E[w1 P̃( Y0t  y|x;                            ⇤
                                                                                                             )]
                          + l ⇤ (Wi )0 E[(P̃˙ ( Y0ti  y|x;               ⇤
                                                                            ))(w1    w0 )]
                          + l ⇤ (Wi )0 E[↵( ⇤ )( { Yti  y} + P̃( Y0ti  y|x;                        ⇤
                                                                                                       ))
                                                                                       ⇤
                              E[w0 ( { Yti  y}               P̃( Y0ti  y|x;            )])˙⇡( ⇤ )]] + o p (1)
Now, suppose that the propensity score and the CDF of Y0ti |X are correctly specified.
Note that
                                              l ⇤ (Wi )0 E[(Ṗ( Y0ti  y;          ⇤
                                                                                     ))(w1      w0 )] = 0,
                               E[w0 ( ⇤ )P( Y0ti  y|x;                 ⇤
                                                                          )]    E[w0 { Yt  y}] = 0
and
                              [l ⇤ (Wi )0 E[↵( ⇤ )( { Yti  y} + P( Y0ti  y|x;                       ⇤
                                                                                                        ))]
                                                                       90


                                                                           ⇤
                                  E[w0 ( { Yti  y}     P( Y0ti  y|x;       ))]˙⇡( ⇤ )]] = 0
Then,
      p
       n(F̂  Y0t |D=1 (y)     F   Y0t |D=1 (y))
               X  n 
           1/2                          ⇤                              ⇤                     ⇤                       ⇤
      =n              w0 (Di , xi ;       )( { Yti  y} (w0 (Di , xi ;   )   w1 (Di , xi ;     ))P( Y0ti  y|xi ;      )
               i=1
        w1 (Di )F    Y0t |D=1 (y)  + o p (1)
               X  n
           1/2
      =n               (Di , xi , Y0i , Y1i ) + o p (1)
               i=1
                                                                                                                          ⇤
    Proof of Proposition 1:
Proof. The result follows from Theorem 3 and the functional central limit theorem for empirical
distribution functions.                                                                                                   ⇤
    Proof of Proposition 2
Proof. The result follows by Proposition 1, Lemma B.4 in Callaway and Li (2019), and similar
arguments used to establish Proposition 4 in Callaway and Li (2019).                                                      ⇤
    Proof of Theorem 6:
Proof. The result follows from Proposition 2 and Lemma 3.9.23(ii) in Vaart and Wellner (1996).
                                                                                                                          ⇤
    Outline of proof of Proposition 3
Consider the asymptotic expansion in the nonparametric case of Theorem 5. Consider each term,
assuming that the bootstrap estimate of each nuisance function converges to the estimate of each
function based upon the unweighted data. I will consider the first term in the expansion It remains
to show that each term is o p (n1/2 ). Each term outside of the first term converges converges either
due to Rothe and Firpo (2019) or due to the fact that supx2X |ˆ⇡⇤ (xi )                    ⇡ˆ (xi )| = o p (n1/4 ). For the
                                                          91


first term, let Gn ( f1 ) = n1/2 (Pn             P) f0 (D, Yt , x⇤ ), where Pn is the empirical measure, P is the
expectation, and
                                              (1    D)(      Y ty       P̂( Y0t  y|x⇤ )) ⇤ ⇤
                     f1⇤ (D, Yt , x) =                                                      (ˆ⇡ (xi )     ⇡ˆ (x⇤i ))
                                                            p⇤ (1     ⇡ˆ (x⇤ ))2
Since sup x2X |ˆ⇡⇤ (x)      ⇡ˆ (x)| . o p ((kn a/d ) = o p (1) by Theorem 3.2 in Chen (2007) and the previous
proof, then define F = { f1 : ||ˆ⇡⇤ (x)              ⇡ˆ (x)||1                 ⇤
                                                                      1n , || P̂ (x)   P̂(x)||1     2n },  where         1n  = C(kn a/d )
for some C > 0, and             2n  = K r , where r > 1/2. Let G1 = {⇡ 2 ⇤cp (X) : ||⇡                                ⇡ˆ ||1       1n and
G2 = {P 2 M : ||P               P̂||1        2n }, where M denotes a Hölder space containing an estimate of
P( Y0t  y|x), such as the kernel estimator mentioned in this text, with smoothing parameter a1
such that d/a1        2. Note the following
                          Pn 1 ⇤ ⇤
                              i=1 i (ˆ  ⇡ (x ) ⇡ˆ (x⇤ ))
                                                               sup f1 2F Gn ( f1 ) + n1/2 sup f1 2F P f1
                                          n1/2
I will consider the second term first. Since p and ⇡(x) are bounded away from 0, then
      n1/2 sup f0 2F P f1
                                    "                              !                                                                 #
             1/2                        1     D          1                                                        ⇤   ⇤           ⇤
       =n        sup⇡2G1,P2G2 E                                      [ { Yt  y}         P( Y0t  y|xi )](ˆ⇡ (x )            ⇡ˆ (x )
                                           p⇤ (1        ⇡ˆ (x⇤ ))2
       . n1/2 sup x2X |ˆ⇡⇤ (x⇤ )      ⇡ˆ (x⇤ )|sup x2X |P̂⇤ (x⇤ )      P̂(x⇤ )|
       . o p (1)
where the last line follows from Assumption B.1. Now, I will consider the term sup f1 2F Gn ( f1 ).
Let F1 B        n  = C(kn a/d ), so ||F||P,2 .           n. Let F1 = { f1 : ||ˆ⇡⇤          ⇡||1  C, ||P̂⇤           P||  1}. Define
F10 = {⇡ 2 ⇤cp (X) + ⇡ˆ ⇤ : ||⇡|| p,2  C} and F20 = {P 2 M + P̂⇤ : ||P|| p,2  1}. Then,
                 logN[] (✏, F , L2 (p)) . logN[] (✏/           2n , F0 , L2 (p)
                                             . logN[] (✏/      2n , F10 , L2 (p)   + logN[] (✏/  2n , F20 , L2 (p)
                                                                      p
                                             . logN[] (✏/      2n , ⇤c (X), L2 (p)    + logN[] (✏/    2n , M, L2 (p)
                                             . ( n /✏)d/a + ( n /✏)d/a1
                                                                    92


This is sufficient to demonstrate that the bracketing integral J[] (||F1 || p,2 , F , L2 (p)) converges. Then
Pn     1 (⇡ˆ⇤ (x⇤ ) ⇡ˆ (x⇤ ))
  i=1  i
           n1/2
                              = o p (1).
Now, note that
                           "                               !                                           ⌘#
                                1    Di  ⇡ˆ ⇤ (x⇤ i )   Di ⇣ ⇤             ⇤                        ⇤
                       E                              +      P̂ ( Y0t  y|x ) P̂( Y0t  y|x )
                                  p⇤ 1 ⇡ˆ ⇤ (x⇤ i ) p⇤
                               h               ⇣                                  ⌘i
                        . E (ˆ⇡⇤ (x) Di ) P̂⇤ ( Y0t  y|x⇤ ) P̂( Y0t  y|x⇤ )
                               h                                    ⇣                                             ⌘i
                        . E (|ˆ⇡⇤ (x⇤ ) ⇡ˆ (x⇤ )| + |ˆ⇡(x⇤ ) ⇡(x)|) P̂⇤ ( Y0t  y|x⇤ ) P̂( Y0t  y|x⇤ )
                                                                             Pn   1 (ˆ ⇤ ⇤
                                                                                  i ⇡ (x ) ⇡ˆ (x⇤ ))
The result then follows by the same steps used to show that                   i=1
                                                                                     n1/2
                                                                                                      is o p (1). Then the main
result follows by Theorem 3.6.1 in Vaart and Wellner (1996).
      Proof of Theorem 7:
Proof. The result follows by Proposition 3, Lemma 3.9.23(ii), and Theorem 3.9.11 in Vaart and
Wellner (1996).                                                                                                               ⇤
                                                                 93


                                           APPENDIX F
                              CHAPTER 2 TABLES AND FIGURES
                                 Figure F.1: Average Absolute Bias
Figure F.1: QT T dr represents the doubly robust estimator. QT T cl represents the Callway and Li
estimator. The graph in the first row and column represents the scenario when both nuisance func-
tions are correctly specified. The graph in the first row and second column represents when neither
nuisance function is correctly specified. The graph in the second row and first column represents
when the propensity score is correctly specified but the conditional cdf nuisance function is incor-
rectly specified. The graph in the second row and second column represents when the propensity
score is incorrectly specified but the conditional cdf nuisance function is correctly specified.
                                                   94


                                         Figure F.2: RMSE
Figure F.2: QT T dr represents the doubly robust estimator. QT T cl represents the Callway and Li
estimator. The graph in the first row and column represents the scenario when both nuisance func-
tions are correctly specified. The graph in the first row and second column represents when neither
nuisance function is correctly specified. The graph in the second row and first column represents
when the propensity score is correctly specified but the conditional cdf nuisance function is incor-
rectly specified. The graph in the second row and second column represents when the propensity
score is incorrectly specified but the conditional cdf nuisance function is correctly specified.
                                                   95


                          Figure F.3: QTT Unemployment Estimates
Figure F.3: The top panel represents estimates of the QT T (⌧) and their confidence intervals using
my doubly-robust estimator. The bottom panel represents the estimates of the QT T (⌧) using the
Callaway and Li estimator. The blue line represents the curve of point estimates. The red lines
represent the 95% confidence bonds.
                                                96


                                           APPENDIX G
                                      CHAPTER 3 FIGURES
                                Figure G.1: Average Absolute Bias
Figure G.1: QT T gmmpro represents the estimates when both nuisance functions are correctly
specified. QT T mispro represents the estimates when the propensity score is correctly specified,
but the conditional cdf is incorrectly specified. QT T omispro represents when only the propenstiy
score is misspecified, but the conditional cdf is correctly specified. QT T nomispro represents when
neither of the nuisance functions are correctly specified.
                                                   97


                                         Figure G.2: RMSE
Figure G.2: QT T gmmpro represents the estimates when both nuisance functions are correctly
specified. QT T mispro represents the estimates when the propensity score is correctly specified,
but the conditional cdf is incorrectly specified. QT T omispro represents when only the propenstiy
score is misspecified, but the conditional cdf is correctly specified. QT T nomispro represents when
neither of the nuisance functions are correctly specified.
                                                   98