STATISTICAL METHODS FOR SINGLE CELL GENE EXPRESSION: DIFFERENTIAL
      EXPRESSION, CURVE ESTIMATION AND GRAPHICAL MODELLING
                                        By
                                  Satabdi Saha
                              A DISSERTATION
                                  Submitted to
                          Michigan State University
                   in partial fulfillment of the requirements
                                for the degree of
                       Statistics – Doctor of Philosophy
                                       2022


                                         ABSTRACT
STATISTICAL METHODS FOR SINGLE CELL GENE EXPRESSION: DIFFERENTIAL
         EXPRESSION, CURVE ESTIMATION AND GRAPHICAL MODELLING
                                               By
                                          Satabdi Saha
This dissertation elucidates a set of statistical methods developed for analysis of single cell
gene expression datasets. Gene expression profiling of single cells has led to unprecedented
progress in understanding normal physiology, disease progression and developmental pro-
cesses. However, despite many improvements in high throughput sequencing, various tech-
nical factors including cell-cycle heterogeneity, library size differences, amplification bias,
and low RNA capture per cell lead to high noise in scRNA-seq experiments. A primary
characteristic of these datasets is the presence of high number of zeroes which represents the
undetectable level of expression for a transcript. Statistical methods capable of modelling
novel single cell experiments are developed and new estimation strategies are proposed and
validated using simulated and real data experiments.
    • In Chapter 1, the motivation and underlying philosophies of single cell gene expression
      is reviewed. Methods for analysis of dose response experiments and gene co-expression
      networks are reviewed and novel statistical hypothesis to be investigated using single
      cell experiments are discussed.
    • In Chapter 2, I analyze a unique in vivo dose response hepatic scRNAseq dataset con-
      sisting of 9 dose groups with 3 biological replicates for 11 distinct liver cell types for
      greater than 100K cells. A Hurdle model for multiple group data is proposed, which
      models the bimodality of single cell gene expression within multiple groups. Based
      on the model assumptions, I derive a fit for purpose Bayesian test for simultaneously
      testing the differences in mean gene expression and zero proportions for multiple dose
      groups. For comparison the counterpart likelihood-ratio test for differential expres-


  sion that incorporates testing for both components is also derived. This chapter was
  originally published in [1].
• In Chapter 3, dose response curve estimation for single cell experiments is studied. Cur-
  rent protocols for genomic dose response modelling are only capable of modelling bulk
  and microarray datasets. A semiparametric regression model for joint dose response
  curve estimation for multiple cell-types while accounting for confounding covariates is
  proposed. A novel, scalable and efficient optimization algorithm using the MM phi-
  losophy is proposed for the estimation of both monotone and non-monotone curves.
  Two relevant tests of hypothesis are discussed and the proposed methods are validated
  using several simulated datasets.
• In Chapter 4, co-expression network estimation is studied using graph signal processing.
  A kernelized signed graph learning approach is developed for learning single cell gene
  co-expression networks, based on the assumption of smoothness of gene expressions
  over activating edges. Performance is assessed using real human and mouse embryonic
  datasets. This chapter was originally published in [2].


This work is dedicated to my husband and son. You have made me better, stronger, and
                     more fulfilled than I could have ever imagined.
                                            iv


                               ACKNOWLEDGEMENTS
I would like to express my deepest gratitude to my advisors, Dr Tapabrata Maiti and Dr
Samiran Sinha, both of whom have been extremely supportive of my study, research, and
professional development. It has been inspiring to observe their scientific rigor and enthusi-
asm for interdisciplinary research. I am grateful for their constant encouragement, patience,
guidance, and the tremendous support throughout my doctoral studies at MSU. I would like
to thank Dr Sudin Bhattacharya, Dr Rance Nault and Dr Tim Zacharewski for introducing
me to the field of toxicogenomics; it has always been a wonderful experience collaborating
with them on highly interdisciplinary problems.
    I would like to thank my committee members, Dr Shlomo Levental and Dr Lyudamila
Sakhanenko for serving in my committee and being patient readers of my work. I am highly
indebted to Dr Sakhanenko and Dr Levental for being wonderful teachers; their contribution
towards my PhD training is immense. In addition, I am very grateful to Dr Selin Aviyente
and her doctoral advisee Abdullah Karaslaanli for collaborating with me on developing
signal processing based graph learning ideas, which contribute to the fourth chapter of this
dissertation.
    I thank Prof. Elijah Dikong and Prof. Camille Fairbourne for their advice on my teaching
assistantships and for being wonderful mentors. Special thanks to the wonderful staff of our
department, Sue, Teresa, Megan, Andy and Tami for their extensive support throughout the
entire duration of my doctoral studies.
    I sincerely thank my friends at MSU, Nilanjan, Sanket, Alex, Abhijnan, Sumegha and
Phuong for their generous help and friendship. I also thank my brother and friend Subhrajit
for being a constant source of positivity and encouragement. Finally and most importantly
I wish to thank my mom and my husband. This dissertation would not have been possible
without their endless support and encouragement.
                                              v


                               TABLE OF CONTENTS
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       x
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . .               . . . . . . .  1
   1.1 Motivation for exploring single cell gene expression . . . . . . .      . . . . . . .  1
   1.2 Structure of single cell gene expression datasets . . . . . . . . .     . . . . . . .  2
   1.3 Dose Response Experiments . . . . . . . . . . . . . . . . . . . .       . . . . . . .  3
       1.3.1 National Toxicology Program’s approach to genomic dose
              response modeling . . . . . . . . . . . . . . . . . . . . .      . . . . . . .  4
              1.3.1.1 Determining Adequate Signal in the Data . . .            . . . . . . .  5
              1.3.1.2 Filtering of Measured Features . . . . . . . . .         . . . . . . .  6
              1.3.1.3 Dose Response Curve Estimation . . . . . . . .           . . . . . . .  7
   1.4 Covariation Analysis . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . 10
CHAPTER 2 BAYESIAN SINGLE CELL RNASEQ DIFFERENTIAL GENE
              EXPRESSION TEST FOR DOSE RESPONSE STUDY
              DESIGNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . 12
   2.1 Single Cell Dose Response Experiments . . . . . . . . . . . . . . . . . .       . . . 12
   2.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . 14
       2.2.1 Animal handling and treatment . . . . . . . . . . . . . . . . . .         . . . 14
       2.2.2 Real scRNAseq and snRNAseq datasets . . . . . . . . . . . . .             . . . 14
       2.2.3 Dose-response data simulation . . . . . . . . . . . . . . . . . . .       . . . 15
       2.2.4 Single cell RNA-seq Hurdle Model . . . . . . . . . . . . . . . . .        . . . 16
              2.2.4.1 Hypothesis Formulation . . . . . . . . . . . . . . . . .         . . . 17
              2.2.4.2 Single cell Bayesian Hurdle model Analysis (scBT) . .            . . . 17
              2.2.4.3 Multiple group Likelihood Ratio Test (LRT) . . . . . .           . . . 21
              2.2.4.4 Linear model-based Likelihood Ratio Test (LRT linear)            . . . 22
       2.2.5 Benchmarking method selection . . . . . . . . . . . . . . . . . .         . . . 22
              2.2.5.1 Seurat Bimod . . . . . . . . . . . . . . . . . . . . . . .       . . . 23
              2.2.5.2 MAST . . . . . . . . . . . . . . . . . . . . . . . . . . .       . . . 23
              2.2.5.3 Limma-trend . . . . . . . . . . . . . . . . . . . . . . .        . . . 25
              2.2.5.4 Wilcoxon Rank Sum (WRS) Test . . . . . . . . . . . .             . . . 25
              2.2.5.5 ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . .        . . . 26
              2.2.5.6 Kruskal-Wallis (KW) Test . . . . . . . . . . . . . . . .         . . . 26
       2.2.6 Benchmarking and sensitivity analyses . . . . . . . . . . . . . .         . . . 26
   2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
       2.3.1 Performance accuracy of DE test methods . . . . . . . . . . . .           . . . 29
       2.3.2 Type I error control and power . . . . . . . . . . . . . . . . . .        . . . 32
       2.3.3 Parameter Sensitivity Analysis . . . . . . . . . . . . . . . . . . .      . . . 33
       2.3.4 Test method agreement . . . . . . . . . . . . . . . . . . . . . . .       . . . 35
                                              vi


  2.4 Real dose–response dataset    DE analysis   . . . . . . . . . . . . . . . . . . . . . 36
  2.5 Discussion . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
  2.6 Conclusion . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
  2.7 Acknowledgements . . . .      . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
CHAPTER 3 SEMIPARAMETRIC DOSE RESPONSE CURVE
             ESTIMATION FOR SINGLE CELL DOSE RESPONSE
             EXPERIMENTS . . . . . . . . . . . . . . . . . . . . . .          . . . . . . . 42
  3.1 Single Cell Dose Response Experiments . . . . . . . . . . . . . .       . . . . . . . 42
      3.1.1 Motivating experimental study and hypothesis of interest            . . . . . . 44
      3.1.2 Literature on dose response curve estimation . . . . . . .        . . . . . . . 45
      3.1.3 Literature on MM algorithms . . . . . . . . . . . . . . .         . . . . . . . 47
  3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . 49
      3.2.1 Model, notations and assumptions . . . . . . . . . . . . .        . . . . . . . 49
      3.2.2 Penalized Estimation . . . . . . . . . . . . . . . . . . . .      . . . . . . . 55
      3.2.3 Monotonicity constraints . . . . . . . . . . . . . . . . . .      . . . . . . . 56
      3.2.4 Confidence interval estimation . . . . . . . . . . . . . . .      . . . . . . . 57
      3.2.5 Model Selection . . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . 58
  3.3 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . 58
  3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
      3.4.1 Simulation Design . . . . . . . . . . . . . . . . . . . . . .     . . . . . . . 59
      3.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . . . 60
  3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . 72
CHAPTER 4 KERNELIZED SIGNED GRAPH LEARNING FOR
             SINGLE CELL GENE REGULATORY NETWORK
             INFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . .        . . . . . 74
  4.1 Single Cell Gene Regulatory Networks . . . . . . . . . . . . . . . .        . . . . . 74
  4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . . . . . 78
      4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . . . 78
      4.2.2 Low- and High-frequency Signals on Unsigned Graphs . . . .            . . . . . 79
      4.2.3 Unsigned Graph Learning . . . . . . . . . . . . . . . . . . .         . . . . . 79
      4.2.4 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     . . . . . 80
  4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . 81
      4.3.1 Signed Graph Learning . . . . . . . . . . . . . . . . . . . . .       . . . . . 81
      4.3.2 Kernelized Signed Graph Learning . . . . . . . . . . . . . .          . . . . . 82
      4.3.3 Hyperparameter Selection . . . . . . . . . . . . . . . . . . .        . . . . . 84
      4.3.4 Generation of simulated datasets from zero-inflated negative
             binomial distribution . . . . . . . . . . . . . . . . . . . . . .    . . . . . 85
      4.3.5 Performance Metrics . . . . . . . . . . . . . . . . . . . . . .       . . . . . 86
             4.3.5.1 AUPRC and AUROC: . . . . . . . . . . . . . . . .             . . . . . 86
             4.3.5.2 AUPRC Activating/Inhibitory: . . . . . . . . . . .           . . . . . 86
             4.3.5.3 EPR . . . . . . . . . . . . . . . . . . . . . . . . . .      . . . . . 87
  4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
      4.4.1 Synthetic Datasets . . . . . . . . . . . . . . . . . . . . . . .      . . . . . 88
                                             vii


       4.4.2 Real Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     93
   4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  96
   4.6 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      99
APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    . 100
   APPENDIX A BAYESIAN SINGLE CELL RNASEQ DIFFERENTIAL GENE
                   EXPRESSION TEST FOR DOSE RESPONSE STUDY
                   DESIGNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      . 101
   APPENDIX B SEMIPARAMETRIC DOSE RESPONSE CURVE
                   ESTIMATION FOR SINGLE CELL DOSE RESPONSE
                   EXPERIMENTS . . . . . . . . . . . . . . . . . . . . . . . . .          . 104
   APPENDIX C KERNELIZED SIGNED GRAPH LEARNING FOR
                   SINGLE CELL GENE REGULATORY NETWORK
                   INFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . .        . 138
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
                                            viii


                                 LIST OF TABLES
Table 1.1 Parametric models for genomic dose response analysis . . . . . . . . . . .   8
Table 2.1 Dose-response models for simulation of scRNAseq data . . . . . . . . . . .  16
Table 3.1 Parameter estimate, asymptotic standard error, bias and mean squared
          error(MSE)of parameter β under full model and the intercept model.
          The results are averaged over 500 replicates and reported for ni = 100, 300 63
Table 3.2 Parameter estimate, asymptotic standard error, bias and mean squared
          error(MSE)of parameter ϕ under full model and the intercept model.
          The results are averaged over 500 replicates and reported for ni = 100, 300 63
Table 3.3 Parameter estimate, asymptotic standard error, bias and mean squared
          error(MSE)of parameter ψ0 under full model and the intercept model.
          The results are averaged over 500 replicates and reported for ni = 100, 300 63
Table 3.4 Parameter estimate, asymptotic standard error, bias and mean squared
          error(MSE)of parameter ψ1 under full model and the intercept model.
          The results are averaged over 500 replicates and reported for ni = 100, 300 72
                                           ix


                                  LIST OF FIGURES
Figure 2.1 Flow diagram of the simulation, benchmarking, and experimental data
           evaluation strategy presented in the manuscript. Briefly, SplattDR was
           developed to simulate dose-response scRNAseq data and validated based
           on experimental dose-response data. Simulated datasets were generated
           varying diverse parameters 10 times and then used to assess the perfor-
           mance of each test method. Each test method was also assessed using
           experimental data from the hepatic snRNAseq dose response dataset
           obtained from male mice gavage every 4 days for 28 days with 0.01,
           0.03, 0.1, 0.3, 1, 3, 10, or 30 µg/kg TCDD. Related figures for each
           analysis from the main body are noted. . . . . . . . . . . . . . . . . . . 24
Figure 2.2 Comparison of simulated and real dose-response data. (A) Relationship
           between gene-wise mean expression and percent zeroes for simulated
           and real dose-response data. Simulation data consisted of 10,000 genes
           and 9 dose groups based on parameters derived from experimental dose-
           response snRNAseq data. Black line represents a fitted model to the ex-
           perimental data from which the normalized root mean square deviation
           (NRMSD) of simulated data was determined. (B) Relationship between
           gene-wise mean expression and variance for simulated and experimen-
           tal data. NMRSD was calculated for simulated data from the fitted
           model represented as a black line. (C) Distribution of log(fold-changes)
           in experimental and simulated data showing the median and minimum
           and maximum values. (D) Principal components analysis of simulated
           data colored according to simulated dose groups. (E) NMRSD esti-
           mated relative to fitted model in A,B for simulated data generated from
           initial parameters derived from published hepatic scRNAseq (two dose;
           GSE148339), hepatic whole cell (whole cell; GSE129516), and peripheral
           blood mononuclear cell (PBMC; GSE108313) datasets. (F) NMRSD es-
           timated relative to model fitted to cell-type specific experimental dose-
           response data when simulated from initial parameters estimated from
           that same cell type. Box and whisker plots show median NMRSD, 25th
           and 75th percentiles, and minimum and maximum values. . . . . . . . .      29
                                             x


Figure 2.3 Classification performed of DE analysis tests. (A) ROCs estimated
           from simulated dose-response scRNAseq data for 9 DE test methods
           including all genes expressed in at least 1 cell (unfiltered). (B) ROCs
           for 9 DE test methods after filtering simulated dose-response scRNAseq
           data for genes expressed in only ≥ 5% of cells (low levels) in at least one
           dose group. (C) Precision-recall curves (PRCs) for 9 DE test methods
           on unfiltered simulated dose-response scRNAseq data. (D) PRCs for
           9 DE test methods on filtered simulated dose-response scRNAseq data.
           Lines represent the mean values and shaded region reflects the standard
           deviation for 10 independent simulations. (E) Precision of DE test
           methods. (F) FPR of DE test methods. (G) MCC for test methods.
           E,F,G Box and whisker plots median values, 25th and 75th percentiles,
           and minimum and maximum values for 10 independent simulations.
           Points reflects values for each independent simulation. Panels display
           comparisons of unfiltered and filtered datasets. . . . . . . . . . . . . . . . 30
Figure 2.4 Evaluation of Type I and II error control. (A) False positive rate (FPR)
           of 9 differential expression test methods estimated from negative con-
           trol (0% DE genes) simulated dose-response scRNAseq data including
           all genes expressed in at least 1 cell (unfiltered) and genes expressed
           in only ≥ 5% of cells in at least one dose group (filtered). (B,C) Lo-
           gistic regression models were fitted to negative control data to predict
           the probability of false positive identification using percent zeroes and
           mean expression as covariates. Lines represent the predicted probabil-
           ity of false positive classification with the shaded region representing
           the 95% confidence interval. (D) False negative rate (FNR) of 9 dif-
           ferential expression test methods estimated from positive control (100%
           DE genes) simulated dose-response scRNAseq data including unfiltered
           and filtered datasets. (E,F) Logistic regression models were fit to posi-
           tive control data. Lines represent predicted probability of false negative
           classification with shaded region representing the 95% confidence interval.    32
                                              xi


Figure 2.5 Matthews correlation coefficient (MCC) from sensitivity analyses of dif-
           ferential expression test methods. (A) MCC for 9 DGEA test methods
           determined from simulated dose response data with varying number of
           cells per dose group. Simulations consisted of 5,000 genes with a proba-
           bility of differential expression of 10% and 9 dose groups. (B) MCC for
           simulated data varying the cells numbers by dose group. The number of
           cells in each of the 9 doses groups is shown on the right. (C) MCC for
           varying proportion of differentially expressed genes. (D) MCC when
           varying the mean fold-change (location) of repressed differentially ex-
           pressed genes. (E) MCC for varying distribution of fold-change (scale)
           of differentially expressed genes. (F) MCC for varying dropout rates
           calculated as in Table S3. Points represent median and error bars repre-
           sent minimum to maximum values. Boxplots represent median, 25th to
           75th percentile, and minimum to maximum values. Each analysis con-
           sisted of 10 replicate datasets including all genes expressed in at least
           1 cell (unfiltered) and genes expressed in ≥ 5% of cells in at least one
           dose group (filtered). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Figure 2.6 Agreement of differential expression test methods on experimental dose-
           response data. (A) Upset plot showing the intersection size of genes
           identified as differentially expressed by 9 different test methods in hep-
           atocytes from the portal region of the liver lobule. (B) Intersect of
           differentially expressed genes in portal fibroblasts. (C) Intersect size in
           hepatic stellate cells. Vertical bars represent the intersect size for test
           methods denoted by a black dot. Horizontal bars show the total number
           of differentially expressed genes identified within each test (set sizes).
           Only intersects for which genes were identified are shown. Genes were
           considered differentially expressed when (i) expressed in > 5% of cells
           within any given dose group and (ii) exhibit a |fold-change| ≥ 1.5. A
           heatmap in the upper left corner of each panel shows the pairwise AUCC
           comparisons for the 500 lowest p-values. (D) Relative proportion of cell
           types identified in each dose group of the real dataset for the cell types in
           A,B,C. Experimental snRNAseq data was obtained from male mice gav-
           aged with sesame oil vehicle (vehicle control) or 0.01 – 30 µg/kg TCDD
           every 4 days for 28 days. (E) Graph metrics for gene set enrichment
           analysis of portal fibroblasts grouped by similarity in gene membership.
           Violin plots show distribution of node-wise values for each test method.
           (F) Network visualization of significantly enriched (adjusted p-value ≤
           0.05) gene sets using the Bayes factor ranked genes of portal fibroblasts.
           Groups of ≥ 2 nodes were manually annotated following commonality
           in the gene set names. Each node represents a gene set with the size
           of the node representing the number of genes in a gene set, and edges
           connect nodes with ≥ 50% overlap. . . . . . . . . . . . . . . . . . . . .        37
                                              xii


Figure 3.1 Results of the simulation study, illustrating the performance of Model
           1 (see Simulation Design) in 500 replicates with 300 sample size. The
           columns correspond to the three different cell-types. Continuous red
           and blue lines and the shaded grey region represent the log-mean curve
           averaged across 500 random replicates, the true simulated curves, and
           the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .   64
Figure 3.2 Results of the simulation study, illustrating the performance of Model
           1 (see Simulation Design) in 500 replicates with 100 sample size. The
           columns correspond to the three different cell-types. Continuous red
           and blue lines and the shaded grey region represent the log-mean curve
           averaged across 500 random replicates, the true simulated curves, and
           the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .   64
Figure 3.3 Results of the simulation study, illustrating the performance of Model
           1 (see Simulation Design) in 500 replicates for sample size 300. The
           columns correspond to the three different cell-types. Continuous cyan,
           peach, darkblue, darkgreen and red lines represent the represent the true
           curve and the estimated log-mean curves averaged across 500 random
           replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-Int and NB-
           GAM models respectively . . . . . . . . . . . . . . . . . . . . . . . . . .     65
Figure 3.4 Results of the simulation study, illustrating the estimated RMSE of
           Model 1 (see Simulation Design) in 500 replicates for sample size 300.
           The columns correspond to RMSE boxplots of the NB-GAM, ZINB-
           GAM-Dose, ZINB-GAM-Int and ZINB-SPL models respectively for the
           three different cell-types plotted over 500 replicates. . . . . . . . . . . . . 65
Figure 3.5 Results of the simulation study, illustrating the performance of Model
           2 (see Simulation Design) in 500 replicates for 300 sample size. The
           columns correspond to the three different cell-types. Continuous red
           and blue lines and the shaded grey region represent the log-mean curve
           averaged across 500 random replicates, the true simulated curves, and
           the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .   66
Figure 3.6 Results of the simulation study, illustrating the performance of Model
           2 (see Simulation Design) in 500 replicates for 100 sample size. The
           columns correspond to the three different cell-types. Continuous red
           and blue lines and the shaded grey region represent the log-mean curve
           averaged across 500 random replicates, the true simulated curves, and
           the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .   66
                                             xiii


Figure 3.7  Results of the simulation study, illustrating the performance of Model
            2 (see Simulation Design) in 500 replicates for sample size 300. The
            columns correspond to the three different cell-types. Continuous cyan,
            peach, darkblue, darkgreen and red lines represent the represent the
            true simulated curves and the estimated log-mean curve averaged across
            500 random replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-
            Int and NB-GAM models respectively . . . . . . . . . . . . . . . . . . . .      67
Figure 3.8  Results of the simulation study, illustrating the estimated RMSE of
            Model 2 (see Simulation Design) in 500 replicates for sample size 300.
            The columns correspond to RMSE boxplots of the NB-GAM, ZINB-
            GAM-Dose, ZINB-GAM-Int and ZINB-SPL models respectively for the
            three different cell-types plotted over 500 replicates. . . . . . . . . . . . . 67
Figure 3.9  Results of the simulation study, illustrating the performance of Model
            3 (see Simulation Design) in 500 replicates for 300 sample size. The
            columns correspond to the three different cell-types. Continuous red
            and blue lines and the shaded grey region represent the log-mean curve
            averaged across 500 random replicates, the true simulated curves, and
            the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .   68
Figure 3.10 Results of the simulation study, illustrating the performance of Model
            3 (see Simulation Design) in 500 replicates for 100 sample size. The
            columns correspond to the three different cell-types. Continuous red
            and blue lines and the shaded grey region represent the log-mean curve
            averaged across 500 random replicates, the true simulated curves, and
            the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .   68
Figure 3.11 Results of the simulation study, illustrating the performance of Model
            3(see Simulation Design) in 500 replicates for sample size 300. The
            columns correspond to the three different cell-types. Continuous cyan,
            peach, darkblue, darkgreen and red lines represent the represent the
            true simulated curves and the estimated log-mean curve averaged across
            500 random replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-
            Int and NB-GAM models respectively . . . . . . . . . . . . . . . . . . . .      69
Figure 3.12 Results of the simulation study, illustrating the estimated RMSE of
            Model 3 (see Simulation Design) in 500 replicates for sample size 300.
            The columns correspond to RMSE boxplots of the NB-GAM, ZINB-
            GAM-Dose, ZINB-GAM-Int and ZINB-SPL models respectively for the
            three different cell-types plotted over 500 replicates. . . . . . . . . . . . . 69
                                              xiv


Figure 3.13 Results of the simulation study, illustrating the performance of Model
            4 (see Simulation Design) in 500 replicates for 100 sample size. The
            columns correspond to the three different cell-types. Continuous red
            and blue lines and the shaded grey region represent the log-mean curve
            averaged across 500 random replicates, the true simulated curves, and
            the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .    70
Figure 3.14 Results of the simulation study, illustrating the performance of Model
            4 (see Simulation Design) in 500 replicates for 300 sample size. The
            columns correspond to the three different cell-types. Continuous red
            and blue lines and the shaded grey region represent the log-mean curve
            averaged across 500 random replicates, the true simulated curves, and
            the 95% pointwise confidence intervals. respectively. . . . . . . . . . . . .    70
Figure 3.15 Results of the simulation study, illustrating the performance of Model
            4 (see Simulation Design) in 500 replicates for sample size 300. The
            columns correspond to the three different cell-types. Continuous cyan,
            peach, darkblue, darkgreen and red lines represent the represent the
            true simulated curves and the estimated log-mean curve averaged across
            500 random replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-
            Int and NB-GAM models respectively . . . . . . . . . . . . . . . . . . . .       71
Figure 3.16 Results of the simulation study, illustrating the estimated RMSE of
            Model 4 (see Simulation Design) in 500 replicates for sample size 300.
            The columns correspond to RMSE boxplots of the NB-GAM, ZINB-
            GAM-Dose, ZINB-GAM-Int and ZINB-SPL models respectively for the
            three different cell-types plotted over 500 replicates. . . . . . . . . . . . .  71
Figure 4.1  Euclidean distances (left, normalized to [0, 1]) and correlations (right)
            between expressions of gene pairs in curated datasets studied in Section
            4.4. Values are calculated only for gene pairs that are connected in the
            ground truth GRNs and they are reported separately for activating and
            inhibitory edges. Only inhibitory edges are reported for VSC, since its
            GRN includes only inhibitory edges. . . . . . . . . . . . . . . . . . . . .      77
Figure 4.2  Performance of scSGL and state-of-the-art methods on curated datasets
            as measured by AUPRC for activating and inhibitory edges. x-axis
            indicates dropout ratio in the dataset. . . . . . . . . . . . . . . . . . . .    88
Figure 4.3  Performance of various methods for synthetic datasets with varying
            number of genes (top row), dropout ratio (middle row) and number
            of cells (bottom row). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
                                              xv


Figure 4.4 Scalability analysis of different methods. Run time of benchmarking
           methods are calculated using BEELINE pipeline [3]. Run time of scSGL
           includes kernel construction and optimization procedure. All methods
           are run on the same computer. Results of GENIE3 for 2000 genes are
           not reported due to its high run time. . . . . . . . . . . . . . . . . . . . .    92
Figure 4.5 Performance of methods for two real-world scRNAseq datasets. Inferred
           graphs are compared to three different gene regulatory databases. . . . .         94
Figure 4.6 The subnetworks of 24 lineage specific genes in hESC (A) and 19 well
           known marker genes in mESC (B). We report results of scSGL-r as it has
           the highest AUPRC ratio in Figure 4.5. For clarity, only those edges
           whose absolute edge weight fall into the top 1 percentile are shown.
           Node sizes are proportional to their degrees. . . . . . . . . . . . . . . . .     95
Figure 4.7 UpSet plot that shows intersection between the top 1000 edges by scSGL
           with 3 kernels and benchmarking methods in hESC and mESC datasets.                97
Figure A.1 Principal components analysis of cell types identified in a real hepatic
           dose-response snRNAseq dataset. Points represent a distinct cell and
           colors reflect the dose group. . . . . . . . . . . . . . . . . . . . . . . . . . 101
Figure A.2 Comparison of fold-change distribution in simulated and real dose-response
           snRNAseq data where the log-normal mean (facLoc) and standard de-
           viation (facScale) were varied as well as the percentage of differentially
           expressed genes and proportion of downregulated DE genes. A total
           of 5000 genes were simulated or sampled from real data and the fold-
           change for the highest dose group was calculated. The Kullback-Leibler
           Divergence (KLD) intrinsic discrepancy (ID) was used to evaluate the
           similarity in distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Figure A.3 Benchmarking scores for data simulated using the Splatter [4] wrapper
           Splattdr with default initial parameters. A total of 4500 cells and 5000
           genes were simulated across 9 dose groups with a probability of being
           differentially expressed of 10%, 50% of which were downregulated. (A)
           Ground truth was used to estimated the False Positive Rate (FPR),
           True Positive Rate (TPR), False Negative Rate (FNR), True Negative
           Rate (TNR), precision, balanced accuracy, and F1 score. Boxplots and
           whiskers represent values for 10 replicate simulations. (B) The area-
           under the concordance curve (AUCC) was calculated as previously de-
           scribed (ref) for the 100 most significant genes (K = 100). Heatmap
           represents the pairwise AUCC for each DE analysis grouped by similarity. 103
                                            xvi


Figure C.1 AUROC values of various methods for synthetic datasets with three
           different topologies (random, modular and hub) and varying number of
           genes (top row), dropout ratio (middle row) and number of cells (bottom
           row). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Figure C.2 EPR values of various methods for synthetic datasets with three differ-
           ent topologies (random, modular and hub) and varying number of genes
           (top row), dropout ratio (middle row) and number of cells (bottom row).           142
Figure C.3 AUROC and EPR ratios of methods for two real-world scRNAseq datasets.
           Inferred graphs are compared to three different gene regulatory databases. 143
Figure C.4 Performance of scSGL and state-of-the-art methods on curated datasets
           as measured by AUPRC ratios for activating and inhibitory edges. Each
           column corresponds to a synthetic network. Abbreviations: LI, linear;
           CY, cycle; LL, linear long; BF, bifurcating; BFC, bifurcating converging
           and TF, trifurcating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Figure C.5 Edges detected using scSGL-r between 24 Lineage marker genes of hESC
           at different time points of the differentiation process. Only edges whose
           absolute edge weights fall into top 10 percent are shown. Edge thick-
           nesses are proportional to their weights, and node sizes are proportional
           to their degrees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
                                             xvii


                                        CHAPTER 1
                                     INTRODUCTION
1.1     Motivation for exploring single cell gene expression
Cells are the fundamental unit of life: Millions of cells in our body coordinate to perform
basic physiological functions essential for maintaining life. Yet, several critical biological
processes such as the genesis, development and fate commitment of early-stage embryos,
are determined by the biology of a single cell. The ability to investigate single-cell states
have allowed exploration of critical biological systems with unprecedented resolution. It is
now possible to easily profile individual cells instead of cell populations, which advances our
fundamental understanding of the intrinsic cellular heterogeneity and dynamics [5, 6, 7].
    However, despite many improvements in high throughput sequencing, various technical
factors including cell-cycle heterogenity, library size differences, amplification bias, and low
RNA capture per cell lead to high noise in single-cell RNAseq (scRNAseq) experiments.
Recent technologies are capable of sequencing millions of cells, but often generate highly
sparse expression datasets due to shallow sequencing. The presence of these issues result in
substantial noise that often obscures the true biological signal and renders the application
of traditional statistical models for analysis of single cell datasets unsuitable [8, 9].
    The availability of gene expression data on a single cell level has led to a more complete
picture of the average state of an organism across time and space. Denoting Y as the vector
of gene expression measurement and Z as the set of covariates, significant efforts have been
made in modelling E(Y |Z) to study differential expression of genes, ie. to determine if the
observed difference or change in expression levels of genes between experimental conditions
is statistically significant.
    Compared to the bulk genomic protocols, single cell gene expression allows inference on
cellular heterogeneity. Bulk experiments dealt with data averaged over cells, hence measuring
                                                 1


cell-cell variation is unfamiliar ground in gene expression methods. Grouping cells into
predefined cell-types (assume that cells within a cell-type group are independent replicates),
estimating cell-type specific models and quantifying the statistical differences therewith can
lead to to important insights into cell-type specific behaviour. One might be interested in
modelling the functional form of E(Y |Z) for different cell-types and quantify the variability
between the estimated functions.
    Finally one might desire inference on the co-expression of Y , which explains how the
expression of one gene covaries with the others, an example studied in coexpression network
estimation. This might help us understand how genes can regulate each other, and render
deeper biological insights.
1.2     Structure of single cell gene expression datasets
Unlike microarray and bulk RNAseq datasets, scRNAseq data exhibits excess zero values
due to the low per cell RNA input, biases in capture and amplification, transcriptional
bursts, and other technical factors [10, 11, 12]. This no expression (zero values), due to a
conflation of both biological and technical factors, results in an excessive number of zeroes
leading to bimodal expression. Consequently, single cell models consider the gene expression
distribution as a mixture of an unexpressed (zero) and a positively (non-zero) expressed
population [13, 11, 14, 1]. The statistical model for Y is formulated as
                        Yi,j |Ri,j ∼ f (θ)
                                                                                           (1.1)
                          Yi,j = 0 with probability 1 when Ri,j = 0
where f (θ) denotes the distribution of gene j in cell i, given that it is detectable. The
indicator variable Ri,j indicates the presence or absence of expression for gene j in cell i.
Most existing methods use the Poisson or the negative binomial distribution to model the
read counts of individual genes directly [10, 13, 11, 15]. Another approach that is frequently
used is to assume that log(Yi,j ) follows a normal distribution, which leads to a more tractable
mathematical theory than count distributions have [14, 16, 17, 1]. Additionally bulk RNA-seq
                                                2


data have been historically modelled using both count and continuous distributions and the
logarithmic transformation used to make the data continuous is well accepted in the field of
bulk genomics [18, 19, 20]. There is also an ongoing debate in the field of single cell genomics
as to whether the excess zeros present in the data need to modelled differently assuming a
sampling process different from the process used to model the true biological expression.
Recent research has argued that count distributions like the negative binomial or a Poisson
distribution with a Gamma prior on the mean, that eventually leads to a negative binomial
posterior is enough for modelling the excess zeros in the expression data. What we observed
in our experimental datasets is a very high percentage of zeroes, often close to 60 − 70% and
the experimental scientists were more convinced that such severe percentage of zeroes were
more a result of the technical errors than the actual biological process. We conducted several
experiments to test this hypothesis [1] and concluded that a zero induced distribution would
best model our experimental data [21, 22]. Therefore throughout the entire dissertation we
stick to zero-induced distributions. However all our proposed methods could be well adapted
to work without the zero-inflation part.
1.3     Dose Response Experiments
The central idea of toxicology lies in understanding the molecular mechanisms of action that
makes a treatment toxic. An important tool for studying the mechanisms of action is to
understand the response of a living organism to a range of doses of the toxic substance.
The dose–response relationship, or the dose response curve, describes the magnitude of the
response of an organism, as a function of exposure to a stressor (usually a toxin) after a
certain exposure time [23]. Developing dose response models is crucial for determining safe,
hazardous and beneficial levels of pollutants, drugs, foods and several other substances to
which humans or other organisms are exposed [24]. These conclusions often form the basis
for public policy. The U.S. Environmental Protection Agency (EPA) has developed extensive
guidance and reports on dose–response modeling and assessment, as well as software [25].
                                               3


The potency value derived from the dose response model is the benchmark dose (BMD).
BMD is the dose or concentration that produces a predetermined change in the response
rate of an adverse effect. This predetermined change in response is called the benchmark
response (BMR), and is generally assumed to be the standard deviation of the response at
zero dose [26]. Dose-dependent gene expression profiling (aka genomic dose-response studies
(GDRS)) of drugs and chemicals has been proposed as an alternative test to rodent bioassays
to assess human health risks. Application of single-cell RNA sequencing (scRNAseq) for
the evaluation of chemicals, drugs, and food contaminants presents the rare opportunity of
accounting for cellular heterogeneity in pharmacological and toxicological responses[27].
1.3.1    National Toxicology Program’s approach to genomic dose
         response modeling
The National Toxicology Program (NTP) proposes the use of the approach outlined below
for both in vivo and in vitro genomic dose-response studies (GDRS) [26]. The steps in the
process include (1) designing a dose response experiment with a sufficient number of doses
that can effectively capture the shape of a dose response curve and accurately determine
the benchmark dose (BMD) for all dose responsive genes, (2) designing a statistical test
of hypothesis to determine whether the modelled data shows any effect to the treatment.
This test acts a preliminary step for filtering out genes that aren’t dose responsive. (3)
Conducting a second trend test for identifying genes that exhibit a biologically plausible
response to the treatment, (4) fitting parametric dose response models derived from the
Environmental Protection Agency (EPA) BMD software to identify a biological potency
estimate, ie. benchmark dose (BMD) for each gene exhibiting a dose-related response to
treatment, (5) grouping genes into predifined sets defined by gene ontologies and calculating
the composite BMD of the gene set and (6) finally providing a biological explanations for
the selected set of genes and BMD estimates. As pointed out by NTP, following all the steps
for GDRS modelling leads to consistent modelling and facilitates the use of use of genomic
                                              4


dose-response data in risk assessment. Assuming that the experimental process is adequate,
I mainly focus on the statistical considerations of items 2-4.
1.3.1.1     Determining Adequate Signal in the Data
The first step of the GDRS protocol seeks to determine whether the signal in the data is
adequate to model and likely to yield minimally reproducible findings [28]. Simply put,
we are interested in finding whether a gene is dose responsive. Statistically, this can be
formulated as an ANOVA problem. There is a population of interest for which there is a
true quantitative outcome Y for each of the k levels of dose D. The population outcomes
for each group have mean parameters µ1,j , .., µK,j with no restrictions on the pattern of
means. The population variances for the outcome for each of the k groups defined by the
levels of the explanatory variable all have the same value, σj2 . We are interested in testing
the null hypothesis that “nothing interesting is happening.”. For one-way ANOVA, we use
H0 : µ1 = µ2 = . . . µK , which states that all of the population means are equal, without
restricting what the common value is. The alternative must include everything else, which
can be expressed as “at least one of the k population means differs from all of the other". Use
of a pre-fitering test to determine dose responsiveness helps to avoid modeling of data with no
statistically plausible signal. Modeling data with no statistically significant effect is likely to
yield unreproducible results with highly inaccurate estimates of BMD values. Further, given
the huge number of genes (usually in the order of millions) it is an unnecessary computational
burden to model genes with no statistically plausible signal.
    An associated problem that is studied in gene expression studies is to compare two
biological conditions in order to find differentially expressed (DE) genes. A gene is defined
as DE if it is transcribed into different amounts of mRNA molecules per cell under the two
conditions. However, since we do not observe the true levels of expression, carefully designed
statistical tests help biologists understand to what extent a gene is DE. Currently, there
are dozens of differential gene expression analysis (DGEA) approaches for scRNAseq data;
                                                5


developed based on differences in assumptions, statistical methodologies, and study designs
[29, 30, 31, 32, 33, 34]. A recent comparison of 36 approaches demonstrated acceptable
performance for common bulk RNAseq tools such as edgeR and limma-trend, and MAST
for single cell experiments, as well as common statistical tests such as the Wilcoxon rank
sum (WRS) and the t-test [32]. However, most methods have been developed primarily for
two group comparisons whereas our study designs consists of multiple groups. The use of
two sample tests for multiple group study designs elevate the type I error rate warranting
further investigation of these methods for multiple group dose–response study designs [35].
Chapter 2 proposes the first multiple group statistical testing framework designed exclusively
for single cell dose response designs. The test statistic is derived for both the frequentist and
Bayesian set-up and benchmarked against selected bulk and single cell DE analysis methods
capable of accomodating multiple group designs. A further detailed background and analysis
is delayed until Chapter 2.
1.3.1.2     Filtering of Measured Features
The second step of GDRS protocol applies a statistical trend test and effect size to filter genes.
Statistically, it assumes that the experimenter has apriori knowledge that the response to
the treatment has a known direction, and in most cases, the response increases in magnitude
as the dose is increased. Therefore the statistical analysis is based on the assumption that
the responses are monotonically ordered. Thus the test of hypothesis translates to H0 : µ1 =
µ2 = . . . µK , versus µ1 ≤ µ2 ≤ . . . µK . Several statistical tests have been proposed for testing
this monotonically ordered alternative hypothesis. The test recommended by the US EPA
and NTP is the William’s Trend test [36] to identify monotonic trends.
    Williams’ test is a step-down trend test for testing several treatment levels with a zero
control in a one -way ANOVA design with normally distributed errors of homogeneous vari-
ance. Let there be K groups including the control and let the zero dose level be indicated
with k = 1 and the treatment levels as 2 ≤, · · · ≤ K, then the following K − 1 hypotheses
                                                   6


are tested:
             H0,K−1 : x̄1 = µ2 = · · · = µK ,    Ha,K−1 : x̄1 ≤ µ2 ≤ . . . µK , x̄1 < µK
                                 .
            H0,K−2 : x̄1 = µ2 = .. = µK−1 ,     Ha,K−2 : x̄1 ≤ µ2 ≤ . . . µK−1 , x̄1 < µK−1
            ..                                  ..
             .                                   .
            H0,1 : x̄1 = µ2 ,                    Ha,1 : x̄1 < µ2 ,
The first step is to determine the maximum likelihood (ML) estimates µ             ck under the alterna-
tive hypothesis that there is a response. This is estimated using the pool adjacent violators
algorithm (PAVA) [37]. If the response is a decrease in mean, then the above formulation
applies after a change of the inequality sign. The k th test statistic t̄i is calculated as follows:
                                                   ck − x̄1
                                                   µ
                                        t̄i =    p
                                              s2   1/nk − 1/n1
where s2 is an unbiased estimator of error variance of Y with ν degrees of freedom. The
critical t-values as given in the tables of [36] for α = 0.05 (one-sided) and are looked up
according to the degree of freedoms (ν)) and the order number of the dose level. Regarding
genomic dose-response modeling, NTP acknowledges limitations to using a more traditional
trend test such as the Williams’ Test that identifies only monotonic trends.
1.3.1.3     Dose Response Curve Estimation
The third step of the genomic dose-response analysis focuses on fitting dose-response curves
to each gene that exhibits a response to treatment as determined by the filtering approach de-
scribed above. To model the data, parametric models such as polynomial, linear, power, Hill
and exponential 2, 3,4 dose-response models are fitted to the measured features. Equations
describing the models are given in the table below. A detailed description of all the models
can be found in the EPA’s Benchmark Dose Software (BMDS)Version 3.2 user guide. The
current GDRS modelling softwares [28] are only capable of modelling bulk RNA-sequencing
datasets, with no statistical framework or recommendation existing for dose response mod-
elling of single cell datasets. The bulk gene expression data is usually log transformed and
presumed to have a normal distribution and each model is run assuming constant variance.
                                                     7


                    Model                                     Formula
                    Linear                              µ(dose) = γ + βdose
            Polynomial (order-q) µ(dose) = γ + β1 dose + β2 dose2 + · · · + βq doseq
                      Hill                    µ(dose) = γ + (v ∗ dosen )/(k n + dosen )
             Exponential 2 or 3                  µ(dose) = a ∗ exp(sign(b ∗ dosed ))
             Exponential 3 or 4               µ(dose) = a ∗ (c − (c − 1) ∗ exp(dosed ))
                    Power                               µ(dose) = γ + βdoseδ
               Table 1.1 Parametric models for genomic dose response analysis
    Let us denote a sample of n independent and identically distributed (IID) pairs of vari-
ables, B = {(Zi , Yi )}, i = 1, . . . n, as the data. Zi can be further segmented as Zi = {(Di , Xi )}
where Di represents the dose vector and Xi represents the vector of other confounding co-
variates such as age, gender etc. The main interest of dose response studies lies in modelling
E(Yi ) as a function of the dose vector; ie. E(Yi |Di ) = ζ(Di , θ), where θ is the vector of un-
known parameters. In general, the methods for estimating dose response can be classified as
parametric or nonparametric. Parametric methods assume a model ζ(Di , θ) for the DR curve
where θ is the vector of unknown parameters. Considerable amount of statistical methodol-
ogy exists in parametric modelling of dose-response studies, employing parametric models,
such as Logistic, Exponential, and Gompertz as well as others [38] and the US EPA [28] gives
guidelines on how to employ these methods. These models are generally fit using standard
techniques such as maximum likelihood (ML) or restricted maximum likelihood (RML) and
their functional forms are monotonic allowing for tractable estimation of relevant BMD’s. It
is well known that if a parametric model is correctly specified then the ML/RML estimators
are efficient. However, in many cases, it is difficult to correctly specify the parametric form
of the dose–response curve because the biological mechanism of drug action or toxicity may
be complex and the form of the dose–response curve is unknown apriori [39]. When the
parametric model is misspecified, the corresponding curve estimate may be severely biased.
In addition, fitting of extremely flexible polynomial models with high orders to match diffi-
cult non-monotonic curve shapes, can lead to severe overfitting. The standard workaround
is to fit multiple parametric models, each of which may fit the data reasonably, but produce
                                                      8


a range of BMD estimates that accounts for model uncertainty in the estimation process.
Accounting for this uncertainty using model averaging (MA) has received recent attention
in the literature [40]. However, despite simulation studies suggesting that model-averaged
estimates provide BMD estimates that have low bias and nominal coverage properties, these
estimates may fail to adequately describe the true uncertainty for DR curves on the edge
of the MA model space, hence resulting in inaccurate inference. Further, as the MA results
are available on limited number of parametric forms, models that make fewer assumptions
on parametric forms may allow for better estimation of the dose response curves [39].
    To enhance the robustness of the estimation of the dose–response curve, many nonpara-
metric methods have been proposed. [41] proposed MLE estimation procedures under the
assumption that the dose response curve is sigmoid and non-decreasing. [42] proposed a
method for non-parametric estimation of the dose response curve under monotonicity con-
straints using B-splines regression. [43] used kernel estimators to obtain estimates for the
dose–response curve under general shape restrictions. In the Bayesian setup, several ap-
proaches have been proposed for non-parametric estimation of dose response curves under
monotonicity constraints. From a Bayesian perspective, one specifies a suitable prior on the
regression function that induces monotonicity and then inference is based on the posterior
distribution. [44] employed an additive model with a prior imposed on the slope of the
piecewise-linear functions. [45] adopted mixture modelling of shifted and scaled probability
distribution functions. [46] propose a Gaussian process model with a posterior projection
approach for shape-constrained curves. In contrast to the parametric methods, all the non-
parametric approaches are flexible and learn the shape of the dose response curves based
on the data. To the best of our our knowledge, none of the existing literature has consid-
ered building a non-parametric regression model for both unconstrained and monotonicity
constrained estimation of dose response curves for single cell experiments while accounting
for technical confounders. Incorporation of such a flexible modelling structure will allow for
more accurate modeling of a variety of non-monotonic responses. Further, the incorporation
                                              9


of a flexible nonparametric modeling approach to describe the shape of the diversity of dose-
response behaviors observed in genomic dose-response will lead to the estimation of more
accurate BMD’s.
1.4     Covariation Analysis
A gene co-expression network (GCN) is an undirected graph, where nodes correspond to
different genes, and edges connecting the nodes denote the co-expression relationships be-
tween genes. GCNs can help people learn the functional relationships between genes and
infer and annotate the functions of unknown genes. GCN reconstruction attempts to infer
this co-expression network from high-throughput data using statistical and computational
approaches. Multiple methods encompassing varying mathematical concepts have been pro-
posed during the last decade to infer GCNs using gene expression data from bulk population
sequencing technologies, which accumulate expression profile from all cells in a tissue. These
methods can be broadly classified into two groups: the first group infers a static GCN, con-
sidering steady state of gene expression, while the second group uses temporal measurements
to capture the expression profile of the genes in a dynamic process. A thorough evaluation
of the static and dynamic models used in bulk GCN reconstruction can be found in [47, 48].
In the static group, several computational approaches ranging from correlation networks
[49], Gaussian graphical models [50], Bayesian networks [51], regression analyses [52, 53],
and information theoretical approaches [54, 55] have been used for network inference from
population-level data. Similarly, several models have been proposed for the analysis of pop-
ulation level dynamic gene expression data [56, 57, 58]. In this dissertation we will mainly
focus on estimating a static co-expression network. Mathematically we are interested in
derterming the relationship between Yi and Yj . The first step is to construct a symmetric
adjacency matrix A, where Ai,j is a weighted adjacency score in the range from 0 to 1 be-
tween genes i and j. Ai,j measures the level of association between gene expression vectors
Yi and Yj . Clustering methods can then be applied to search for gene modules based on the
                                              10


resulting distance matrix [49]. The modules can serve to understand functional relationships
between known disease genes and candidate genes. Gene modules can also be used to detect
regulatory genes and study the regulatory mechanisms in various organisms [59]. Further
background will be delayed until Chapter 4.
                                            11


                                         CHAPTER 2
          BAYESIAN SINGLE CELL RNASEQ DIFFERENTIAL GENE
               EXPRESSION TEST FOR DOSE RESPONSE STUDY
                                           DESIGNS
2.1     Single Cell Dose Response Experiments
Single-cell transcriptomics enables researchers to investigate homeostasis, development and
disease at unprecedented cellular resolution [60, 6, 61, 62, 63]. As with any new innovative
technology, diverse tools soon follow to address specific applications and unique challenges.
Currently, there are dozens of differential gene expression analysis (DGEA) approaches for
single-cell RNAseq (scRNAseq) data; developed based on differences in assumptions, statis-
tical methodologies, and study designs [29, 30, 31, 32, 33, 34]. A recent comparison of 36
approaches demonstrated acceptable performance for common bulk RNAseq tools such as
edgeR and limma-trend, and MAST for snRNAseq, as well as common statistical tests such
as the Wilcoxon rank sum (WRS) and the t-test [32]. However, most methods have been
developed primarily for two group comparisons whereas study designs typical of pharmacol-
ogy and toxicology experiments such as dose–responses consist of multiple groups. The use
of two sample tests for multiple group study designs elevate the type I error rate warranting
further investigation of these methods for multiple group dose–response study designs [35].
    Dose–response studies are used to derive the efficacy and/or safety margins such as ef-
fective dose and the point of departure (POD). Significant efforts by the toxicology and
regulatory communities have suggested that acute (< 14 days) and sub-acute (14–28days)
transcriptomic studies as viable alternative to the current standard 2-year rodent bioassay
that significantly reduces the time and resources needed to assess risk [64, 65, 26]. Gene
expression profiling at single-cell resolution could further support such evaluations by identi-
fying cell-specific dose-dependent responses indicative of an adverse event. The U.S. National
Toxicology Program (NTP) recently reported a robust DGEA approach is essential to de-
                                                12


riving biologically relevant PODs [26]. However, concerns regarding the inclusion of false
positives that produce less conservative POD estimates potentially leads to incorrect clas-
sification of mode-of-action( MoA), thus highlighting the importance of controlling type I
error rates [66, 67].
     Unlike microarray and bulk RNAseq datasets, single-cell RNAseq (scRNAseq) data ex-
hibits excess zero values due to the low per cell RNA input, biases in capture and amplifica-
tion, transcriptional bursts, and other technical factors[12]. This no expression (zero values),
due to a conflation of both biological and technical factors, results in an excessive number
of zeroes in an otherwise continuous measure [14]. Therefore, traditional tests of differential
gene expression, based on the assumption of a normal distribution, fail to correctly model the
bimodality of single cell gene expression[16]. Consequently, scRNAseq test methods usually
consider the gene expression distribution as a mixture of a unexpressed (zero) and a posi-
tively (non-zero) expressed population [14, 16, 68]. For example, the Seurat Bimod approach
tests for differential gene expression using a likelihood ratio test designed for the said mixture
population. MAST extends the Seurat Bimod test to a two-part generalized linear model
structure capable of incorporating covariates [14, 16]. Given the improved performance of
MAST [32, 14, 16], we hypothesized that multiple group tests developed assuming the same
distributional framework would be most favorable for dose–response study designs. Further-
more, a Bayesian approach which considers prior knowledge is anticipated to minimize type
I error rates [69, 70].
     The aim of the presented study is to evaluate the performance of existing and novel DGEA
test methods on dose–response scRNAseq datasets. To reduce the rate of false positives
we propose a novel, multiplicity corrected, Bayesian multiple group test (scBT) designed
exclusively for DGEA of dose–response scRNAseq data. Two other fit-for-purpose frequentist
multiple group tests are also examined: (i) a multiple group extension of the Seurat Bimod
test and (ii) a simple extension of test (i) to a generalized linear model framework. Existing
and proposed methods are benchmarked on simulated and real experimental dose–response
                                                 13


datasets. Using simulated datasets we were able to investigate the influence of various
parameters such as number of cells, and illustrate how using different test methods can aid
in gaining biological insight on the role of individual cell types on the pathophysiological
consequences of exposure.
2.2     Materials and Methods
2.2.1    Animal handling and treatment
Male C57BL/6 mice aged postnatal day (PND) 25 were obtained from Charles Rivers Lab-
oratories (Kingston, NY) were housed and treated as previously described [71]. Mice were
housed in Innovive cages (San Diego, CA) with ALPHA-dri bedding (Shepherd Specialty
Papers, IL) at 23◦ C, 30-40% relative humidity, and a 12:12 h light:dark cycle. Aquavive
water (Innovive) and Harlan Teklad 22/5 Rodent Diet 8940 (Madison, WI) was provided ad
libitum. On PND 29, randomly assigned mice were gavaged at Zeitgeber time (ZT) 0 with
0.1 mL sesame oil vehicle (Sigma-Aldrich,St. Louis, MO), 0.01, 0.03, 0.1, 0.3, 1, 3, 10 or 30
µ g/kg TCDD every 4 days for 28 days (7 total administered doses). At day 28 mice were
euthanized by CO2 asphyxiation and livers were immediately flash frozen in liquid nitrogen
and stored at −80◦ C. All animal procedures were approved by the Michigan State Uni-
versity (MSU) Institutional Animal Care and Use Committee (IACUC) and reporting of in
vivo experiments follow the Animal Research: Reporting of In Vivo Experiments (ARRIVE)
[72] and Minimum Information about Animal Toxicology Experiments (MIATE) guidelines
(https://fairsharing.org/FAIRsharing.wYScsE).
2.2.2    Real scRNAseq and snRNAseq datasets
Hepatic single-nuclei RNA-sequencing (snRNAseq) was performed as previously described
                                                  ′
using the 10× Genomics Chromium Single Cell 3 v3.1 kit (10X Genomics, Pleasanton, CA)
[73]. Briefly, nuclei were isolated using EZ Lysis Buffer (Sigma-Aldrich), homogenized by
disposable Dounce homogenizer, washed, filtered using a 70-µ m cell strainer. The nuclei
                                              14


pellet was resuspended in buffer containing DAPI (10 µg/ml), filtered using a 40-µm strainer,
and immediately sorted using a BD FACSAria IIu (BD Biosciences, San Jose, CA) at the
MSU Pharmacology and Toxicology Flow Cytometry Core (facs.iq.msu.edu/). Sequencing
(150-bp paired end) was performed at a depth of 50,000 reads/cell using a NovaSeq6000 at
Novogene (Beijing, China). CellRanger v3.0.2 (10x Genomics) was used to align reads to
mouse gene models (mm10, release 93) including introns and exons to consider both pre-
mRNA and mature mRNA gene models. Seurat was used to integrate and log-normalize
expression data [74]. The data is available on the Gene Expression Omnibus (GEO) at
accession ID GSE184506 and R package versions are listed in Supplementary Table S1.
Additional real datasets were publicly available. Hepatic whole-cell generated using the
10X Genomics platform was obtained from GEO (GSE129516)[63]. Hepatic single-nuclei
processed as the dose–response data for control and high dose TCDD treatment (0 and 30
µg/kg) was obtained from GEO (GSE148339). Peripheral blood mononuclear cell (PBMC)
data also generated using the 10x Genomics platform and Seurat was obtained from the
SeuratData R package [74].
    Gene set enrichment analysis of experimental data was performed using the fgsea v1.14
R package on gene lists sorted by significance values (e.g. P-value). Gene sets from BIO-
CARTA, KEGG, PANTHER and WIKIPATHWAYS were obtained from the Gene Set
Knowledgebase (GSKB; http://ge-lab.org/gskb/) and filtered for gene sets containing 15–250
genes. Gene sets were agglomerated based on overlap of gene membership and only those
showing ≥ 50% overlap were considered similar for subsequent network analyses. Visualiza-
tion and calculation of measures of centrality were determined using igraph v1.2.7. Gene
sets were considered enriched when adjusted P-value < 0.05
2.2.3   Dose-response data simulation
To simulate dose-response scRNAseq datasets we developed a wrapper for the Splatter R
package[75]. Splatter simulates counts using parameters estimated from real data to set the
                                             15


mean expressions, variance, and outlier probability. Other parameters such as the number
of cells, genes, probability of being differentially expressed, mean fold-change of DE genes
(location) and standard deviation of fold-change of DE genes (scale) were manually assigned
to best reflect real data. The wrapper (SplattDR) leverages the group simulation feature
of Splatter by applying a multiplicative factor estimated using dose-response models in 2.1
based on the US EPA Benchmark Dose Software[28]. SplattDR R package is available at
(github.com/zacharewskilab/splattdr).
                          Model                             Formula
                             Hill           µ(dose) = γ + (v ∗ dosen )/(k n + dosen )
                     Exponential 2 or 3        µ(dose) = a ∗ exp(sign(b ∗ dosed ))
                     Exponential 3 or 4 µ(dose) = a ∗ (c − (c − 1) ∗ exp(dosed ))
                           Power                     µ(dose) = γ + βdoseδ
                 Table 2.1 Dose-response models for simulation of scRNAseq data
2.2.4    Single cell RNA-seq Hurdle Model
We model the log-normalized gene expression matrix using a hurdle distribution wherein the
rate of gene expression is assumed to follow a Bernoulli distribution and conditional on a
cell expressing the gene, the log-normalized expression level is assumed to follow a Gaussian
distribution[14]. We denote Yi,j to be the log-normalized expression value of gene j in cell
i, for i = 1, . . . n and j = 1, . . . p. To characterize the bimodal properties of single cell data,
for a given cell, a gene is defined to be either positively expressed or undetected. Define
Rij = I[Yij > 0] to be the indicator variable denoting the presence or absence of expression
for gene j in cell i. Following [14], the log-normalized gene expression values are modeled as
follows:
                         Yi,j |Ri,j ∼ N ormal(µj , σj2 ),
                               Yi,j = 0 with probability 1 when Ri,j = 0,                       (2.1)
                               Ri,j ∼ Bernoulli(ωj ),
where µj and σj2 denote the mean and variance of the gene expression level, conditional on
the gene being expressed and ωj denotes the rate of gene expression of gene j across all cells.
                                                    16


2.2.4.1    Hypothesis Formulation
We now assume that our data has been collected under K conditions (doses), and denote
the data by Dk,o ≡ {(Yk,i,j , Rk,i,j ), i = 1, . . . , nk } . The underlying populations for the
sample data Dk,o for k = 1, 2, . . . , K, dose groups are assumed to be identified by the
parameters (µk,j , σj2 , ωk,j ). The aim of this study is to test for difference in gene expression
patterns between the different dose groups. Traditionally one would perform an ANOVA
test to detect changes in mean across groups for samples with continuous measurements.
However, to account for the bimodality in single cell gene expression distribution, the test
should detect for changes in µj and ωj simultaneously, as both could drive differential gene
expression. Therefore we define,
              H0 :       µ1,j = µ2,j = . . . µK,j = µj and ω1,j = ω2,j = . . . ωK,j = ωj .    (2.2)
versus the alternative
                     Ha :        µk,j is different for at least one k and
                                 ωk,j is different for at least one k, k = 1, . . . K
2.2.4.2    Single cell Bayesian Hurdle model Analysis (scBT)
Given the single cell RNA-seq hurdle model structure, we assume that a priori, given σj2 ,
µk,j ∼ N ormal(mk,0 , τk,µ σj2 ), σj2 ∼ IG(aσ , bσ ), ωk,j ∼ Beta(ak,ω , bk,ω ), where IG is the
inverse gamma distribution with shape aσ and scale bσ and mk,0 , τk,µ , aσ , bσ , ak,ω , bk,ω are
the hyperparameters. Given the large number of gene-wise model fits arising from a single
cell expreriment, there is a pressing need to allow for a parallel structure whereby the same
model is fitted to each gene. The prior distributions on the parameters describe how the
unknown coefficients µk,j ωk,j and σj2 vary across the genes and the dose groups while allowing
for information borrowing between the genes. Now, based on the model assumptions, we
propose a Bayesian test for simultaneously testing the differences in mean gene expression
                                                     17


and dropout proportions as formulated in 2.2.4.1. Under the null hypothesis the marginal
likelihood is written as
                             K Y   nk                                          Rk,i,j
                                                               (Yk,i,j − µj )2
                 Z Z Z Y                                                                                           
                                             1                                                             1−Rk,i,j
        LH0 ,j =                          √         exp −                          ωj            (1 − ωj )
                            k=1    i=1
                                            2πσj                     2σj2
               × π(µj |σj2 )π(σj2 )π(ωj )dµj dσj2 dωj
                              1                                   1
               =        PK Pnk                ×q
                 (2π)( k=1 i=1 Rk,i,j )/2                     PK Pnk
                                                     1 + τµ k=1 i=1 Rk,i,j
                                    Γ(aσ + ( K
                                                P        Pnk
                      1                            k=1      i=1 Rk,i,j )/2)
               ×             ×                             PK Pnk
                 Γ(aσ )baσσ     (1/bσ + Atot /2)aσ +( k=1 i=1 Rk,i,j )/2
                                                                 PK                PK Pnk
                 Beta(aω + ( K
                                 P Pnk
                                            i=1 Rk,i,j ), bω +       k=1 nk − (                 i=1 Rk,i,j ))
               ×                    k=1                                                k=1
                                                                                                              ,       (2.3)
                                                      Beta(aω , bω )
where
                             (K n                               )     (K n                          )−1
                                      k                                          k
                               XX                          m2            XX                      1
                   Atot =                          2
                                          Rk,i,j Yk,i,j + 0       −                Rk,i,j  +
                               k=1 i=1
                                                            τµ           k=1 i=1
                                                                                                 τµ
                             (K n                              )2
                                      k
                              XX                           m0
                         ×               Rk,i,j Yk,i,j  +          .
                               k=1 i=1
                                                           τµ
Under the alternative hypothesis we compute the marginal likelihood without any restriction
on the K means µk,j and the dropout parameter ωk,j ; k = 1, 2, . . . K. Particularly, we
assume that µk,j ∼ N ormal(mk,0 , τk,µ σj2 ), and σj2 ∼ IG(aσ , bσ ), ωk,j ∼ Beta(ak,ω , bk,ω ); k =
1, 2, . . . K. Now, the marginal likelihood under the alternative hypothesis is given by
                                         K Y nk                                                      Rk,i,j
                                                                         (Yk,i,j − µk,j )2
                          Z      Z Y                                                       
                                                        1
                LHa ,j =     ···                    √         exp −                              ωk,j                 (2.4)
                                        k=1 i=1
                                                       2πσj                     2σj2
                                               
                                      1−Rk,i,j
                       × (1 − ωk,j )
                          YK                                YK                  
                                                           2
                       ×         π(µk,j )π(ωk,j ) π(σj )             dµk,j dωk,j dσj2
                          k=1                                 k=1
                                        1                                    1
                       =         P K P nk               × QK p                     Pnk
                           (2π)( k=1 i=1 Rk,i,j )/2            k=1     1  + τ k,µ     i=1 Rk,i,j
                                                             PK Pnk
                               1                  Γ(aσ + k=1 i=1 Rk,i,j /2)
                       ×              ×                                           P nk
                          Γ(aσ )baσσ      (1/bσ + K                   aσ + K
                                                                           P
                                                                                    i=1 Rk,i,j /2
                                                      P
                                                         k=1 Ak /2)
                                                                             k=1
                                                           18


                   K
                      Beta(ak,ω + ni=1 Rk,i,j , bk,ω + nk − ni=1
                                 P k                       P k
                  Y                                              Rk,i,j )
                ×                                                                      (2.5)
                  k=1
                                    Beta(ak,ω , bk,ω )
Now, under the assumption that aω = ak,ω , bω = bk,ω , τµ = τk,µ , for k =1,2, . . . K, and
    √
r! ∼ 2πr(r/e)r we have,
                                         19


       LH0 ,j   √      1−K QK p1 + τ Pnk R
                                                    k,µ    i=1 k,i,j
              =     2πe        × qk=1
      LHa ,j
                                      1 + τµ K            nk
                                             P         P
                                                   k=1    i=1 Rk,i,j
                                         aσ + 12 ( K
                                                   P     P nk
                                                           i=1 Rk,i,j )
                  1/bσ + K
                          P
                             k=1  A k /2             k=1
              ×
                     1/bσ + Atot /2
                s
                   (aω + K
                          P Pnk                                   PK             PK Pnk
                            k=1     i=1 Rk,i,j − 1) × (bω +             k=1 nk −     k=1   i=1 Rk,i,j − 1)
              ×                                               PK
                                              (aw + bw + k=1 nk − 1)
                       PK Pnk                            P K P nk
                (aω + k=1 i=1 Rk,i,j − 1)(aω + k=1 i=1 Rk,i,j −1)
              ×
                    (aw + bw + K                    (aw +bw + K
                                                             P
                                                               k=1 nk −1)
                                 P
                                    k=1 nk − 1)
                        K          K X nk
                       X         X                            PK          PK Pnk
              × (bω +      nk −            Rk,i,j − 1)(bω + k=1 nk − k=1 i=1 Rk,i,j −1)
                       k=1        k=1 i=1
                    s
                 K
                Y                          (aw + bw + nk − 1)
              ×               Pnk                                        Pnk
                k=1
                       (aw +     i=1 Rk,i,j − 1) × (bw + nk −              i=1 Rk,i,j − 1)
                 K 
                Y         (aw + bw + nk − 1)(aw +bw +nk −1)
              ×                                         P nk
                      (aw + ni=1    Rk,i,j − 1)(aw + i=1 Rk,i,j −1)
                             P k
                k=1
                                                                             
                                           1
              ×                                             P nk
                (bw + nk − ni=1     Rk,i,j − 1)(bw +nk − i=1 Rk,i,j −1)
                             P k
                              (K−1)
                  Γ(aω )Γ(bω )
              ×
                   Γ(aω + bω )
    The Bayes factor is then defined as
                                                     LH0 ,j      π(Ha )
                                       BF01,j =              ×                                             (2.6)
                                                     LHa ,j      π(H0 )
    where π(Ha ) and π(H0 ) are the prior probabilities for the alternative and null model,
respectively. The hyperparameters are obtained by maximising the marginal likelihood under
the null and the alternative hypothesis. Detailed derivations of the likelihood function and
the Bayes Factor are provided in Supplementary Material. Using the test of hypothesis
described in Equation (2.2) scBHM conducts a test of DE for each gene independently.
To control for multiplicity we adopt the FDR correction approach discussed in [76]. The
rejection threshold is estimated in terms of the posterior probabilities of the null hypothesis,
p(H0,j |Dj ). For a target FDR α, the procedure rejects all hypotheses with p(H0,j |Dj ) < ζ
                                                       20


, where p(H0,j |Dj ) = [1 +         1
                                 BF01,j
                                          ]−1  and ζ is the largest value such that              C(ζ)
                                                                                                 J(ζ)
                                                                                                      ≤ α where,
J(ζ) = {j : p(H0,j |Dj ) ≤ ζ} and C(ζ) = j∈J(ζ) p(Ho,j |Dj ).
                                                     P
2.2.4.3    Multiple group Likelihood Ratio Test (LRT)
To carry out a direct performance comparison with scBT, we extend the Seurat Bimod [14]
for multiple groups. Assuming that all the K groups have the same variance σj2 and omitting
the index j for clarity, the likelihood ratio test can be defined as;
                                                       supθ∈H0 L(θ|Y, R)
                                        Λ(Y, R) =
                                                       supθ∈Ha L(θ|Y, R)
where the likelihood can be written as;
                                          Y                          Y
                       L(θ|Y, R) =             ωkek (1 − ωk )nk −ek      f (Yi,k |µk , σ 2 )
                                            k                       i∈Ck
where Y and R represent the gene observation vector and the gene indicator vector across
K dose groups, θ = {µk , σ 2 , πk , k = 1, . . . , K} is the vector of unknown parameters, Ck is
the set of cells expressing the gene in group k (i.e.Ck = {i : Ri,k = 1}), ek = i Ri,k is the
                                                                                                    P
number of cells expressing the gene in group k and f is the density function of the normal
distribution with parameters µk and σ 2 . Therefore we can write,
                                        P                P       P
                      sup{ω,µ,σ2 } ω ( k ek ) (1 − ω)( k nk − k ek ) k i∈Ck N (Yik |µ, σ 2 )
                                                                        Q Q
       Λ(Y, R) =
                   sup{ωk ,µk ,σ2 ;k=1,...,K} k ωkek (1 − ωk )(nk −ek ) k i∈Ck N (Yik |µk , σ 2 )
                                               Q                          Q Q
                                P                   P     P
                     sup{ω} ω ( k ek ) (1 − ω)( k nk − k ek )
                =
                   sup{ωk ,k=1,..,K} k ωkek (1 − ωk )(nk −ek )
                                        Q
                       sup{µ,σ2 } k i∈Ck N (Yik |µ, σ 2 )
                                     Q Q
                ×                          Q Q
                  sup{µk ,σ2 ;k=1,..,K} k i∈Ck N (Yik |µk , σ 2 )
                        P                       P                                                      P
                             e                       e                                       ¯      − k2 ek
                  Y P k nkk ek  1 − P k nkk nk −ek                  P         ¯ +        + 2
                                                                            k ek (Yk − Y )
                =          k
                           ek         ·            k
                                                  ek           · 1+ P P        ek              ¯+ 2
                    k     nk
                                             1 − nk                       k
                                                                                        +
                                                                               i=1 (Yik − Yk )
                = Λb (R) · Λn (Y + )
where Λb is a binomial LRT, Λn is a normal LRT, Y + is the set of positive Y values, Y¯k+ is the
mean of the positive Y values in the k th group and Ȳ¯ = K1 K                k=1 Yk . Thus, it can be shown
                                                                                    ¯
                                                                           P
                                                         21


that the combined LRT is the product of a binomial and a normal LRT statistic Λb and Λn ,
both of which are derived using classical statistical theory. Applying classical asymptotic
results about LRTs , −2logΛ(Y, R) converges to a χ2 distribution with (2K − 2) degrees
of freedom under H0 . We note here that the sample size for the χ2 statistic is not n, but
n+ = K   k=1 ωk nk and n+ is sufficiently large for our simulation and real data experiments.
       P
2.2.4.4    Linear model-based Likelihood Ratio Test (LRT linear)
The generalized linear model approach MAST was identified as one of the top performing
tests for pairwise differential expression testing [32, 16]. Deriving from their approach, the
LRT multiple test is extended to a generalized linear model framework, where the mean and
the dropout proportions are modelled as a linear function of the dose groups (assumed to
be a continuous covariate). Using the same distributional assumptions defined in 2.1 we fit
a logistic regression model for the discrete variable R and a Gaussian linear model for the
continuous variable Y conditional on (R = 1) independently, as follows: E(Yij |Rij = 1) =
m0,j +m1,j ∗di and logit{P (Rij = 1)} = ψ0,j +ψ1,j ∗di , where d represents the continuous dose
groups. Under this modelling approach, the null hypothesis described in Equation 2.2 can be
rewritten as H0 : E(Yij |Rij = 1) = m0j and logit{P (Ri,j = 1)} = ψ0j . The regression models
are fit using the lm and brglm functions in the stats and brglm R packages. The likelihood
ratio test statistic is computed using the same statistical theory discussed discussed for the
LRT multiple test and it asymptotically follows a χ2 distribution under H0 .
2.2.5    Benchmarking method selection
Our fit-for-purpose tests were benchmarked to existing differential expression testing meth-
ods or their multiple group equivalent based on previously reported performance, ability to
consider multiple groups, or whether they served as foundation for the scBT and multiple
group LRT (LRT multiple) tests developed here. Seurat Bimod served as foundation for the
                                                22


scBT and LRT multiple tests as previously outlined, and MAST was identified as one of the
top performing test for two group comparisons [32]. Similarly, limma-trend performed well
for two sample comparisons and can consider multiple groups. The Wilcoxon Rank Sum
test was identified as providing excellent balance between its ability to identify DE genes
and speed, and is the default test for the Seurat R package for scRNAseq analysis. It was
also reported that the t test performed well and therefore we included the ANOVA and
Kruskal-Wallis (KW) tests, a parametric and non-parametric alternative of the t test for
multiple group comparisons. All tests were run without correction for batch effects or other
nuisance covariates. Multiplicity for each test was controlled using FDR correction 31. All
tests, including scBT, LRT multiple and LRT Linear are available in our scBT R package
(github.com/satabdisaha1288/scBT). R session information is listed Supplementary Table
S1. A flow diagram outlines our benchmarking approach (Figure 2.1).
2.2.5.1     Seurat Bimod
Seurat Bimod test [14] is a pairwise differential gene expression testing approach developed
assuming the single cell RNA-seq hurdle model framework. The test is formulated as H0 :
the mean and the dropout parameters of the gene vector under two dose groups are equal
versus Ha : the mean and the dropout parameters differ over the two groups. The LRT based
test statistic −2logΛ(y, r) converges to a χ2 distribution with two degrees of freedom under
H0 . The computations are carried out using the R Package Seurat .
2.2.5.2     MAST
MAST [16] proposes a two-part generalized linear model for differential expression analysis
of scRNAseq data. The first part models the rate of gene expression using logistic regression
logit(ωij ) = Xi βjω and the second part uses a linear model to express the positive gene-
expression Yij , conditional on Rij as µij = Xi βjµ ; where βjω andβjµ are the coefficients of
the covariates used in the logistic and linear regression model respectively. A test with an
                                              23


Figure 2.1 Flow diagram of the simulation, benchmarking, and experimental data
evaluation strategy presented in the manuscript. Briefly, SplattDR was developed to
simulate dose-response scRNAseq data and validated based on experimental dose-response
data. Simulated datasets were generated varying diverse parameters 10 times and then
used to assess the performance of each test method. Each test method was also assessed
using experimental data from the hepatic snRNAseq dose response dataset obtained from
male mice gavage every 4 days for 28 days with 0.01, 0.03, 0.1, 0.3, 1, 3, 10, or 30 µg/kg
TCDD. Related figures for each analysis from the main body are noted.
asympotic χ2 null distribution is employed for identifying DEGs and multiplicity is controlled
using FDR correction[77]. Despite the fact that LRT-linear and MAST have the same hurdle
regression framework, the estimation process for the two methods has some significant differ-
ences. First, to achieve shrinkage of the continuous variance, MAST assumes a gamma prior
distribution on the precision (inverse of variance) parameter and estimates its posterior max-
imum likelihood estimator (MLE) and uses that in place of the regular MLE of the precision
parameter. Second, it fits a Bayesian logistic regression model for the discrete component by
assuming Cauchy distribution priors centered at zero for the regression parameters. This is
                                               24


done to deal with cases of “linear separation” where the parameter estimates diverge to ±∞
and the Fisher information matrix becomes singular. And finally, it considers the cellular
detection rate defined as CDRi = p1 Pj=1 Rij to be a covariate in both the logistic and
                                         P
linear regression models. LRT linear on the other hand simply fits the non-Bayesian linear
and the logistic regression models without considering variance shrinkage or adjustment for
additional covariates.
2.2.5.3    Limma-trend
Limma-trend [20] proposes a linear model based differential expression approach for mod-
elling RNA-seq experiments of arbitrary complexity. Their framework models the mean gene
expression as a function of several continuous and categorical covariates. A separate linear
model is fitted for each gene, but the gene-wise models are linked by global parameters using
the parametric empirical Bayes approaches [78]. The global variance estimated by the em-
pirical Bayes procedure also incorporates a mean variance trend, allowing better modelling
of low abundance genes. Finally, test of differential gene expression is carried out by testing
the significance of one or more coefficients of the fitted linear model.
2.2.5.4    Wilcoxon Rank Sum (WRS) Test
WRS [79] test is a non-parametric test commonly used for pairwise DGE testing. The test is
formulated as H0 , the distributions of the gene vector under two dose groups are equal versus
Ha : the distributions are not equal.The test involves the calculation of the U statistic, which
for large samples is approximately normally distributed. Since this is a pairwise test, a union
is taken over all the genes found to be DE in each of the pairwise tests. The computations
are carried out using the wilcox.test function in R package stats and multiplicity is controlled
using FDR correction.
                                               25


2.2.5.5         ANOVA
Analysis of variance (ANOVA) [80] is very commonly used for testing the differences among
means in multiple groups. For a fixed gene j, it is assumed that the observed gene vector
yk,i,j for cell i is grouped by dose. Assuming that Yk,i,j ∼ N ormal(µk,j , σj2 ), ANOVA aims
to test the null hypothesis H0 : µ1,j = µ2,j = . . . µK,j = µj versus Ha : µk,j , i = 1, . . . , n; j =
1, . . . , p; k = 1, . . . , K is different for at least one k. The test statistic is computed using the
aov function in R package stats and it follows a F-distribution with (K − 1) and (n − K)
degrees of freedom. Multiplicity is controlled by applying FDR correction on the obtained
p-values.
2.2.5.6         Kruskal-Wallis (KW) Test
KW [81] test extends the WRS test for multiple groups. It is also a non-parametric extension
of the ANOVA test.The test is formulated as; H0 , the distributions of the gene vector under
K dose groups are equal versus Ha : the distributions are not equal. The computation of
the KW test statistic is carried out using the kruskal.test function in R- package stats and
it asymptotically follows a χ2 distribution with K − 1 degrees of freedom. Multiplicity is
controlled by applying FDR correction on the obtained p-values.
2.2.6        Benchmarking and sensitivity analyses
Benchmarking of DE test methods was performed on simulated datasets based on initial
parameters derived from real dose-response snRNAseq data. The probability of differential
expression was set to 10% with a 50% probability of being down-regulated, equally dis-
tributed among the dose-response models in Table 1. Batch parameters were used to include
sample variation associated with data obtained from 3 individuals in each dose group. A
total of 5,000 genes were simulated for 4,500 cells (500 per dose group) using the same doses
as the real dataset. Sensitivity analyses varied each of the following parameters according
                                                         26


to values is supplementary Table 1: cell abundance equally distributed among dose groups,
varying cell numbers in each dose group, percent DE genes, proportion of downregulated
DE genes, fold-change location or scale, and dropout rate. Each simulation was replicated
10 times using a different initial seed. Method concordance was determined as area under
the concordance curve (AUCC) for the top 100- or 500-ranked genes in simulated and real
datasets, respectively, as previously described [82].
2.3     Results
For benchmarking of DGEA methods, a ground truthis required. Existing simulation tools
such as PowSimR, SymSim, SPsimSeq and Splatter are commonly used for power analy-
ses, evaluating DE analysis methods, and testing cell clustering strategies [75, 83, 84, 85].
Tools such as SymSim and Splatter are also capable of simulating cell trajectories and model
differentiation processes. Trajectories which exhibit non-linear changes over time or across
different developmental stages are not unlike dose–response effects which change over a con-
tinuum of doses. However, dose-responsive changes commonly follow defined trajectories
such as Hill, exponential, power, and linear models [28]. To simulate dose–response scR-
NAseq data we developed a wrapper for the Splatter scRNAseq data simulation tool named
SplattDR. SplattDR modified the Splatter grouped data simulation strategy by adjusting
counts from means defined by one of the dose–response functions outlined in the Materials
and Methods.
    To demonstrate the modeling capability of SplattDR, 10000 gene expression responses
were simulated with a 10% probability of being differentially expressed, equally distributed
across the dose–response models. Parameters used in Splatter were initially estimated from
our experimental single nuclei RNAseq (snRNAseq) dose–response dataset. The simulated
data compared to the experimental data showed the relationship between the mean expres-
sion, percentage of zeroes, and mean variance were consistent (Figures 2.2A, B). Estimation
of the normalized root mean square deviation (NRMSD) from a curve fit to the experimental
                                              27


data indicated excellent concordance.
    The distribution of log(fold-changes) between vehicle (dose 0) and the highest simulated
dose (dose 9; 30 µg/kg) showed a more even distribution within a similar range compared
to experimental data which was skewed towards induction (Figure 2.2C).
    However, the gene induction skew was captured by modulating the parameters affecting
the probability of differential expression and the proportion of differentially repressed genes
(Supplementary Figure A.1). Principal components analysis (PCA) of the simulated data
clearly showed the dose-dependent characteristics of scRNAseq data with distinct clusters
increasing in separation with increasing dose (Figure 2.2D) which was also resolved by PCA
within the experimental data (Supplementary Figure A.2).
    To our knowledge, no other published in-vivo dose-response scRNAseq datasets are avail-
able limiting the number of datasets to estimate initial parameters for simulation to date. To
investigate whether existing datasets generated using a different study design (e.g. whole cells
or different tissue source) could be used to derive initial parameters, we also simulated 10 000
genes starting with parameters estimated from (i) a two-dose liver snRNAseq (GSE148339),
(ii) whole cell liver scRNAseq (GSE129516) and (iii) peripheral blood mononuclear cells
(PBMC; GSE108313) datasets. When compared to a model fit for experimental data to
determine the relation between mean expression and percent zeroes or mean variance, the
NRMSD for data simulated from these datasets were between 1 and 10% with data simulated
from whole cell data differing the most from the model fit (Figure 2.2E). We then explored
whether parameters estimated from distinct cell types could replicate the characteristics of
that same cell type (Figure 2.2F). Not surprisingly, using initial parameters derived from
individual cell types in the experimental dose–response data had lower NRMSD than those
derived from the whole cell dataset. Notably, when data derived from a lower abundant
cell sub-type was used to estimate starting parameters, the dose–response characteristics for
that cell subtype was also poorly modeled (Figures 2.2E, 2.2F)
                                               28


Figure 2.2 Comparison of simulated and real dose-response data. (A) Relationship
between gene-wise mean expression and percent zeroes for simulated and real dose-response
data. Simulation data consisted of 10,000 genes and 9 dose groups based on parameters
derived from experimental dose-response snRNAseq data. Black line represents a fitted
model to the experimental data from which the normalized root mean square deviation
(NRMSD) of simulated data was determined. (B) Relationship between gene-wise mean
expression and variance for simulated and experimental data. NMRSD was calculated for
simulated data from the fitted model represented as a black line. (C) Distribution of
log(fold-changes) in experimental and simulated data showing the median and minimum
and maximum values. (D) Principal components analysis of simulated data colored
according to simulated dose groups. (E) NMRSD estimated relative to fitted model in A,B
for simulated data generated from initial parameters derived from published hepatic
scRNAseq (two dose; GSE148339), hepatic whole cell (whole cell; GSE129516), and
peripheral blood mononuclear cell (PBMC; GSE108313) datasets. (F) NMRSD estimated
relative to model fitted to cell-type specific experimental dose-response data when
simulated from initial parameters estimated from that same cell type. Box and whisker
plots show median NMRSD, 25th and 75th percentiles, and minimum and maximum values.
2.3.1    Performance accuracy of DE test methods
We evaluated the performance of several differential gene expression analysis methods on
simulated datasets consisting of nine dose groups of 500 cells each (4500 total) and 5000
                                                29


Figure 2.3 Classification performed of DE analysis tests. (A) ROCs estimated from
simulated dose-response scRNAseq data for 9 DE test methods including all genes
expressed in at least 1 cell (unfiltered). (B) ROCs for 9 DE test methods after filtering
simulated dose-response scRNAseq data for genes expressed in only ≥ 5% of cells (low
levels) in at least one dose group. (C) Precision-recall curves (PRCs) for 9 DE test
methods on unfiltered simulated dose-response scRNAseq data. (D) PRCs for 9 DE test
methods on filtered simulated dose-response scRNAseq data. Lines represent the mean
values and shaded region reflects the standard deviation for 10 independent simulations.
(E) Precision of DE test methods. (F) FPR of DE test methods. (G) MCC for test
methods. E,F,G Box and whisker plots median values, 25th and 75th percentiles, and
minimum and maximum values for 10 independent simulations. Points reflects values for
each independent simulation. Panels display comparisons of unfiltered and filtered datasets.
genes with a 10% probability of being differentially expressed (500 differentially expressed
genes). Selection criteria for test inclusion are outlined in the Materials and Methods sec-
tion and included 9 test methods; ANOVA [80], single-cell Bayes hurdle model test (scBT),
Kruskall–Wallis (KW) [81], limma-trend [20, 78], likelihood-ratio test (LRT) linear and mul-
tiple, MAST [16], Seurat bimod [14] and WRS [79]. With ground truth from simulated data,
the sensitivity, specificity, and precision for each test method was computed. Area under the
receiver-operating characteristic curve (AUROC) was used to measure test performance for
correctly classified differentially expressed genes.
                                                 30


    In unfiltered data, AUROC scores showed similar performance for most tests except scBT
which had the largest AUROC among all test methods (Figure 2.3A). To account for the
inherent class imbalance between differentially expressed and non-differentially expressed
classes the area under the precision-recall curves (AUPRC) was also calculated. Similar
to AUROCs, AUPRCs identified scBT as the best performing test (Figure 2.3C). In most
standard differential expression testing pipelines genes expressed at low levels are removed to
minimize false detection rates. Following filtering of genes expressed in ≥ 5% of cells in any
dose group, scBT was consistently ranked as the best test based on AUROC and AUPRC
scores. The performance of LRT linear test also improved, with comparable AUROC and
AUPRC scores relative to scBT, suggesting LRT linear is poorly suited for genes expressed
at low levels (Figures 2.3B–D).
    AUROC and AUPRC reflect the performance of each test method with varying signifi-
cance (i.e. P-value) thresholds. In the standard pipeline a fixed threshold is used, typically a
P-value ≥ 0.05 after adjustment for multiple hypothesis testing (i.e. Bonferroni correction).
For each method except scBT, the performance at an adjusted P-value ≥ 0.05 significance
criteria was evaluated. In scBT analysis, a gene was considered differentially expressed when
the estimated posterior probabilities of the null hypothesis, p(H0 , j|Dj ), was less than , where
the value was chosen to achieve a target FDR of 0.05. scBT significantly outperformed all
other tests in precision rates irrespective of low expression filtering (Figure 2.3E). However,
scBT was less effective in identifying true positives (Figures 3F). Applying the filtering cri-
teria improved the recall rates, but the precision rates remain largely unchanged (Figure
2.3E, 2.3F). Test method classification performance scores were estimated as the Matthews
correlation coefficient (MCC) which is well suited for unbalanced data [86]. We see that the
scBT and LRT linear tests performed best for this metric on both unfiltered and filtered
data (Figure 2.3G).
                                                31


Figure 2.4 Evaluation of Type I and II error control. (A) False positive rate (FPR) of 9
differential expression test methods estimated from negative control (0% DE genes)
simulated dose-response scRNAseq data including all genes expressed in at least 1 cell
(unfiltered) and genes expressed in only ≥ 5% of cells in at least one dose group (filtered).
(B,C) Logistic regression models were fitted to negative control data to predict the
probability of false positive identification using percent zeroes and mean expression as
covariates. Lines represent the predicted probability of false positive classification with the
shaded region representing the 95% confidence interval. (D) False negative rate (FNR) of 9
differential expression test methods estimated from positive control (100% DE genes)
simulated dose-response scRNAseq data including unfiltered and filtered datasets. (E,F)
Logistic regression models were fit to positive control data. Lines represent predicted
probability of false negative classification with shaded region representing the 95%
confidence interval.
2.3.2    Type I error control and power
To investigate test performance in controlling type I errors (false positives), DGEA methods
on simulated datasets were examined with 0% DE genes (i.e. negative control). Using the
threshold for the computed posterior null probabilities, scBT identified only one false positive
gene in 2 of 10 simulations (Figure 2.4 A). ANOVA, scBT, KW, limma-trend and LRT linear
                                                32


had false positive rates (FPRs) below 3% indicating better performance compared to two
group tests. After filtering for genes with low expression levels, scBT still correctly identified
all the non-differentially expressed genes and was the best performing test. These are the
same tests that had a better FPR control in initial simulations (Figure 2.4). To explore
whether mean expression or percentage of zeroes influenced type I error rates, a logistic
regression model was fit to negative control data. We predicted the probability for each gene
to be identified as differentially expressed in the negative control data. While the curve for
scBT is missing since few false positives were identified, the predicted FPR for all the other
tests except LRT linear were also high for highly expressed genes with few zeroes (Figures
2.4B, 2.4C). Next, a positive control dataset with 100% differentially expressed genes was
simulated to evaluate test performance for detecting true positives. All tests except scBT
exhibited a false negative rate (FNR) ≥ 40% (Figure 2.4D). The best performing tests for
FNR also had high FPR. Logistic model regression fitting for false negative classification of
genes shows that the false negative rates were highest when the mean expression was either
too high or too low for all tests (Figures 2.4E, 2.4F).
2.3.3    Parameter Sensitivity Analysis
Experimental scRNAseq datasets will vary between cell types, cell composition, and re-
sponses depending on the target tissue, treatment, number of cells sequenced, and more.
For example, some distinct cell types are very abundant (e.g. hepatocytes), with others
present at lower levels (e.g. portal fibroblasts) in hepatic scRNAseq datasets. Moreover,
treatments such as exposure to a xenobiotic, can elicit dose-dependent changes in relative
proportions of cell types such as the infiltration of immune cells (26). We investigated the
impact by changing cell abundance from 25 to 2000 cells per dose group and observed an
increase in the false positive rate (FPR) when increasing the number of cells. The scBT and
LRT linear tests were less sensitive to an increase in the FPR as cell abundance increased
while the total positive rates (TPR + FPR) increased with cell abundance for all methods.
                                               33


Figure 2.5 Matthews correlation coefficient (MCC) from sensitivity analyses of differential
expression test methods. (A) MCC for 9 DGEA test methods determined from simulated
dose response data with varying number of cells per dose group. Simulations consisted of
5,000 genes with a probability of differential expression of 10% and 9 dose groups. (B)
MCC for simulated data varying the cells numbers by dose group. The number of cells in
each of the 9 doses groups is shown on the right. (C) MCC for varying proportion of
differentially expressed genes. (D) MCC when varying the mean fold-change (location) of
repressed differentially expressed genes. (E) MCC for varying distribution of fold-change
(scale) of differentially expressed genes. (F) MCC for varying dropout rates calculated as in
Table S3. Points represent median and error bars represent minimum to maximum values.
Boxplots represent median, 25th to 75th percentile, and minimum to maximum values.
Each analysis consisted of 10 replicate datasets including all genes expressed in at least 1
cell (unfiltered) and genes expressed in ≥ 5% of cells in at least one dose group (filtered).
                                               34


Although all tests exhibited comparable performance at low cell numbers (≥500), as cell
numbers increased scBT outperformed all other tests in both precision and MCC score (Fig-
ures 2.5 A) Comparison of AUROCs and AUPRCs across cell numbers showed that ANOVA,
KW, limma-trend, and LRT linear tests performed best for a small number of cells, but the
increase in AUROC was steeper for scBT.
    It was also evident from the experimental snRNAseq dataset that the number of cells
per dose group was not fixed. We evaluated the performance of the test methods when the
number of cells dose-dependently increased or decreased, and when the number of cells per
dose group were taken from experimental data. Notably, while scBT had the best MCC
for increasing number of cells per dose, LRT linear performed better than scBT when the
number of cells decreased before and after filtering for genes expressed at low levels (Figure
2.5B). The shift in MCC between increasing and decreasing cell numbers for scBT appears
to be driven by a concomitant decrease in TPR and increase in FNR.
    Increasing the proportion of differentially expressed genes led to an improvement in MCC
except for scBT and LRT linear, though these tests maintained the top MCC scores as well
as AUROC and AUPRC (Figure 2.5C). As the magnitude of the effect increased, LRT
linear performed best at the low end while scBT exhibited the greatest improvement in
MCC (Figure 2.5D). Conversely, while the MCC decreased for most tests when modulating
the fold-change scale of differentially expressed genes, scBT improved and was more stable
(Figure 2.5 E). As the proportion of unexpressed genes increased, the FPR increased with
precision decreasing for all tests. However, scBT was least affected, and maintained the
highest MCC among all tests (Figure 2.5F).
2.3.4    Test method agreement
To assess agreement between tests, the area under the concordance curve (AUCC) for each
pair of tests for the top 100 genes ranked by adjusted P-value was calculated as previously
described [32, 82]. All methods showed excellent concordance (AU CC ≥ 0.77) with LRT
                                                35


linear showing the poorest consistency compared to all other tests while the limma-trend and
ANOVA tests showed perfect agreement with an AUCC of 1 (Supplementary Figure A.3).
Pairwise differential gene expression comparisons between Seurat Bimod, MAST and WRS
had AU CC > 0.95AU CCs while the multiple group tests ANOVA, LRT multiple, KW, and
scBT clustered together with AUCC ranging between 0.9 and 1. In the absence of nuisance
covariates, MAST and Seurat Bimod provided similar results, as expected given their similar
mixture normal model structure. Likewise for ANOVA and limma-trend, both of which rely
on normality assumptions for testing differential gene expression.
2.4     Real dose–response dataset DE analysis
Without ground truth for experimental data, the performance of the differential expression
test methods was examined by first evaluating the agreement for each identified cell type
(Figures 2.6). Genes in the experimental dataset were considered differentially expressed
when expressed in ≥5% of cells in at least one dose group and had a |fold-change| ≥ 1.5.
In hepatocytes, the most abundant cell type, fewer than 5 genes were not detected in all
test methods, with the majority missed by the WRS test (Figure 2.6A). Upon closer exam-
ination, those genes were not expressed in control hepatocytes. Not surprisingly, for all cell
types, the largest intersection was between all tests indicating strong agreement within all
test methods. Only a few tests identified a subset of unique genes as differentially expressed,
which accounted for a very small fraction. For example, LRT linear identified 12 unique dif-
ferentially expressed genes in portal fibroblasts, one of the least abundant cell types (Figure
2.6B). LRT linear was the best performing test for low cell numbers indicating that the 12
unique differentially expressed genes may in fact be true positives. Consistent with simu-
lations of varying cell numbers (Figure 2.6A), 24 genes were not identified as differentially
expressed by the scBT method for stellate cells which exhibit a dose-dependent decrease in
numbers (Figures 2.6C, D). Although scBT outperformed other tests in most scenarios, it
under performed in this scenario. Nevertheless, when ranking genes by significance level (i.e.
                                               36


Figure 2.6 Agreement of differential expression test methods on experimental dose-response
data. (A) Upset plot showing the intersection size of genes identified as differentially
expressed by 9 different test methods in hepatocytes from the portal region of the liver
lobule. (B) Intersect of differentially expressed genes in portal fibroblasts. (C) Intersect
size in hepatic stellate cells. Vertical bars represent the intersect size for test methods
denoted by a black dot. Horizontal bars show the total number of differentially expressed
genes identified within each test (set sizes). Only intersects for which genes were identified
are shown. Genes were considered differentially expressed when (i) expressed in > 5% of
cells within any given dose group and (ii) exhibit a |fold-change| ≥ 1.5. A heatmap in the
upper left corner of each panel shows the pairwise AUCC comparisons for the 500 lowest
p-values. (D) Relative proportion of cell types identified in each dose group of the real
dataset for the cell types in A,B,C. Experimental snRNAseq data was obtained from male
mice gavaged with sesame oil vehicle (vehicle control) or 0.01 – 30 µg/kg TCDD every 4
days for 28 days. (E) Graph metrics for gene set enrichment analysis of portal fibroblasts
grouped by similarity in gene membership. Violin plots show distribution of node-wise
values for each test method. (F) Network visualization of significantly enriched (adjusted
p-value ≤ 0.05) gene sets using the Bayes factor ranked genes of portal fibroblasts. Groups
of ≥ 2 nodes were manually annotated following commonality in the gene set names. Each
node represents a gene set with the size of the node representing the number of genes in a
gene set, and edges connect nodes with ≥ 50% overlap.
                                                37


P-values), AUCC were high for all pairwise comparisons.
    To explore the biological insight gained by using the test methods, gene set enrichment
analysis was performed by ranking genes following significance values (adjusted P-value or
Bayes factor) on gene sets from BIOCARTA, KEGG, PANTHER and WIKIPATHWAYS.
Gene sets were grouped based on their similarity in gene membership into a network for
which centrality measures can be estimated. An examination of portal fibroblasts, which
exhibited the most disagreement among test methods (Figure 2.6B), showed that multiple
group test methods, particularly scBT had improved centrality metrics (centrality – number
of edges; closeness – steps required to access other nodes; and betweenness – number of
paths that go through a node) (Figure 2.6E). Visualization of significantly enriched terms
identified enriched functions associated with growth factor and immune cell signaling in
addition to expected terms such as xenobiotic metabolism and nuclear receptors involved in
lipid metabolism (Figure 2.6F). Alternatively, WRS which did not find as many connected
groups of functions, was largely limited to those identified by scBT except for the hormone
signaling and tryptophan metabolism clusters. While there is no ground truth from real data,
greater agreement between similar gene sets from disparate sources suggests that multiple
group tests such as scBT provide more reliable findings [1]. However, all the test methods
produce comparable gene set enrichment results as expected since the most robust changes
were identified by all the test methods.
2.5     Discussion
The goal of this study was to compare the performance of newly developed DGEA test
methods for dose–response experiments to existing analysis methods. Using simulated data
to generate ground truth, we evaluated the performance of nine differential expression testing
methods which were broadly classified as either fit-for-purpose, multiple group, or two group
tests. Criteria for test method selection was based on previous benchmarking efforts for two
group study designs identifying MAST, limma-trend, WRS, and t-test as the best performers
                                              38


[32, 87]. ANOVA and KW tests were also included for evaluating multiple group comparisons,
and Seurat Bimod, for having the same modelling framework as scBT, LRT multiple and
LRT linear tests. The test methods were ranked from best to worse (1-9) based on type I
error rate, type II error rate, MCC, AUROC and AUPRC (Figure 7, Supplementary Table
S4).
    While several scRNAseq tools have been developed [75, 83, 84, 85], none are developed
to simulate dose–response models commonly identified in toxicological and pharmacological
datasets [28, 88]. Our SplattDR wrapper for the Splatter package (28) was able to show
that simulated data can effectively emulate key experimental scRNAseq data characteristics
when simulation parameters were estimated from various Unique Molecular Identifier (UMI)-
based datasets. In agreement with a previous report, technical and biological factors, such
as cell type, does appear to influence gene dropout rates (18). We primarily focused on
10× Genomics UMI data given the unavailability of real experimental dose–response data
generated using other platforms.
    Overall, test method performance was consistent with their intended application. For
example, fit-for-purpose tests scBT and LRT linear consistently ranked higher followed by
multiple groups tests such as KW and LRT multiple. scBT exhibited the best overall perfor-
mance with excellent FPR control and top ranked MCC while LRT linear struck a balance
between type I and type II error rates. The scBT results are not surprising as Bayes factor-
based tests have proven to be conservative and consequently more appropriate when false
positives are of concern [69, 70]. In the context of investigating chemical or drug MoAs, false
positives have the potential to lead to wasted effort and resources in attempts to validation
and support findings [89]. Moreover, when assessing a large number of genes, a 5% FP rate
(P − value ≥ 0.05) can result in hundreds of FPs that skew MoA classifications [67].
    A single test method was not expected to outperform all other tests under all conditions as
previously demonstrated when comparing pairwise testing [29, 32, 87]. Therefore, we assessed
the strengths and limitations of each test method by varying parameters likely to change
                                               39


within and across various experimental datasets. The number and relative abundance of cell
types is known to be affected by disease or treatment, and the distribution of differential
expression influenced by the chemical, drug, or food contaminant being evaluated [63, 73].
scBT consistently ranked at the top under most scenarios, particularly when the mean and
standard deviation of the fold-change for differentially expressed genes varied. However, scBT
under performed in MCC when the number of cells decrease in a dose-dependent manner
which would be expected in treatments which alter cell population sizes (e.g. inflammation).
Under these circumstances LRT linear outperformed all other tests with scBT performing
similar to the other test methods as evident when 24 differentially expressed genes were
not identified by scBT within experimental data for stellate cells which experienced a dose-
dependent decrease in relative abundance following TCDD treatment. Although excluding
genes expressed at low levels generally improved the performance of all test methods, the
comparative performance of test methods did not significantly change in most cases. We
did not have access to experimental scRNAseq dose–response data, however, we expect that
the scBT would perform equally well as with experimental snRNAseq data as the elevated
number of zeroes are common to both types of data. Major differences between these types
of data are (i) biases in gene detection and (ii) overall counts [73]. Given the higher overall
counts in scRNAseq data, test method such as scBT may even perform better.
    DGEA provides biological information regarding the effects of exposure to chemicals,
drugs, and food contaminants. As expected, gene set enrichment analyses did not dramat-
ically differ in the enriched pathways which are driven by the most robust responses such
as xenobiotic metabolism. However, when integrating gene sets from disparate sources we
found gene sets that partially overlap in gene membership were consistently identified by
multiple group test methods. For example, several gene sets related to growth factors and
cell proliferation were identified by scBT but not WRS. Portal fibroblasts are implicated
in proliferation of cholangiocytes and the secretion of growth factors during development.
Enrichment of these terms suggests a functional role consistent with the induction of bile
                                               40


duct proliferation by TCDD (45,46). In contrast, WRS identified enrichment associated
with tryptophan as well as oxytocin/thyrotropin-releasing-hormone pathways which has not
been linked to the effects of TCDD on portal fibroblasts. Although ground truth for the
complete experimental dataset is not available, the use of test methods such as scBT reduce
experimental noise to identify leads warranting further analysis.
2.6    Conclusion
Collectively, our findings suggest that scBT and LRT linear fit-for-purpose tests are bet-
ter suited for the differential expression analysis of dose–response studies and when false
positives are of greater concern than false negatives. Moreover, consistent with previous
benchmarking efforts, we show that common non-parametric tests such as KW out-perform
test methods developed for scRNAseq data when the study involves comparisons between
multiple groups. Ultimately, each test method performs optimally under diverse scenarios.
While the importance of controlling type I error rates is acknowledged, a balance must be
struck with type II error rates. The tradeoff should be determined based on the individual
research question being investigated. It may even be reasonable to apply different test meth-
ods to distinct cell types based on dropout rates, cell abundance, and changes in relative cell
proportions given the strengths and weaknesses of each test method.
2.7    Acknowledgements
This chapter is based on my published paper [1]. I would like to thank the joint first
author Dr Rance Nault, and co-authors Dr Samiran Sinha, Dr Tapabrata Maiti, Dr Sudin
Bhattacharya, Jack Dodson and Dr Tim Zacharewski for their support and advice. This
work was funded by National Human Genome Research Institute [R21 HG010789]; National
Institutes of Environmental Health Sciences Superfund Research Program [P42 ES004911]
and NSF [DMS 1945824].
                                              41


                                          CHAPTER 3
                   SEMIPARAMETRIC DOSE RESPONSE CURVE
               ESTIMATION FOR SINGLE CELL DOSE RESPONSE
                                       EXPERIMENTS
3.1     Single Cell Dose Response Experiments
Gene expression profiling of single cells has led to unprecedented progress in understanding
normal physiology, disease progression and developmental processes. In contrast to bulk
RNA-sequencing (RNAseq), gene expression profiling of single-cell allows investigation of
cell-specific heterogenity and can be used to assess changes in response to drugs and chemicals
for individual cell populations. Dose-dependent gene expression profiling (aka genomic dose-
response studies (GDRS)) of drugs and chemicals has been proposed as an alternative test to
rodent bioassays to assess human health risks and application of single-cell RNA sequencing
(scRNAseq) for the evaluation of chemicals, drugs, and food contaminants presents the
opportunity of accounting for cellular heterogeneity in pharmacological and toxicological
responses. Although dose response modelling of bulk RNA-seq datasets have been long-used
for generating quantitative estimates of risks associated with such responses, no statistical
study yet exists for the estimation of dose response curves for scRNA-seq datasets.
    Unlike microarray and bulk RNA-seq datasets that record gene expression measurements
averaged over many cells, scRNAseq allows gene expression mesurement on a cellular level
hence enabling biological investigation at the cellular level. However, despite many improve-
ments in high throughput sequencing, various technical factors including cell-cycle hetero-
genity, library size differences, amplification bias, and low RNA capture per cell lead to high
noise in scRNA-seq experiments. Recent technologies are capable of sequencing millions of
cells, but often generate highly sparse expression datasets due to shallow sequencing. The
presence of these issues result in substantial noise that often obscures the true biological
signal and renders unsuitable, the application of traditional statistical models for analysis of
                                                42


single cell datasets. Several statistical models have been proposed for the analysis of single
cell datasets, with most focusing on zero-inflated count data distributions like Poisson or
negative binomial, to account for the overdispersion. Continuous hurdle models like MAST
have also been proposed that treat the zero sampling process to be completely different
from the sampling process of true biological expression, which is generally positive. Several
statistical studies have been designed to investigate problems related to differential gene ex-
pression, cell-type clustering, denoising or imputation, gene regulatory network construction,
trajectory inference and other statistical problems studied earlier in the context of bulk and
microarray datasets. However, no rigorous statistical framework has yet been designed for
dose response modelling for scRNA-seq datasets that are invaluable for deriving cell-type
specific efficacy and/or safety margins such as effective dose and the point of departure
(POD) of several toxicological/ pharmacological responses.
    Regulatory communities have suggested that acute ( < 14 days) and sub-acute (14–28
days) transcriptomic studies as viable alternative to the current standard 2-year rodent
bioassay that significantly reduces the time and resources needed to assess risk. Further-
more, single cell transcriptomic datasets could bolster such investigations by identifying
cell-specific dose-dependent responses indicative of an adverse event. The U.S. National
Toxicology Program (NTP) recently reported a robust DGEA approach is essential to de-
riving biologically relevant PODs. In characterizing biologic and public health significance,
and the need for possible regulatory interventions, it is important to efficiently estimate dose
response functions while accounting for cell-specific heterogenity and known experimental
confounders. Motivated by the lack of a single framework for single cell dose–response esti-
mation and trend testing, while accounting for covariates this chapter proposes a complete
statistical framework for the same.
                                               43


3.1.1    Motivating experimental study and hypothesis of interest
In this work we analyze a unique in vivo dose response hepatic scRNAseq dataset consisting of
9 dose groups with 3 biological replicates for 11 distinct liver cell types for greater than 100K
cells. The 9 dose groups represent 9 doses of varying levels of 2,3,7,8-tetrachlorodibenzo-p-
dioxin (TCDD). The persistent organic pollutant and potent agonist of the aryl hydrocarbon
receptor (AHR), TCDD induces hepatic lipid accumulation (steatosis) that progresses to
steatohepatitis with fibrosis. In humans, TCDD and related compounds are associated
with dyslipidemia and inflammation and in mice, AHR activation by TCDD elicits cell-
specific and spatially resolved histological and gene expression responses. Our interest lies
in estimation of cell-type specific dose response curves for genes of interest in order to derive
finer point of departures (PODs) for use towards informed decision-making on the safety and
toxicity of drugs and chemicals. We hypothesize that some genes of interest would exhibit
different POD’s for different cell-types and our interest not only lies in estimating cell-specific
DR curves but also in testing whether there exists a difference in cell-type specific POD’s.
This would be particularly hard if the genes exhibit slight changes in POD’s over a handful
of cell-populations while exhibiting the same POD over others. In those cases we would like
to investigate how the POD estimates change in comparison to existing ones established via
bulk and microarray based DR studies. Futher, it could be possible that celltypes exhibit
completely different DR curve shapes for some genes whereas there are slight shifts and slope
changes for cell-type specific DR curves for other genes. It is essential to allow the DR curve
to change according to cell-types for a particular gene in order to capture these varying curve
shapes. It is also essential to test whether these estimated changes between celltypes are
statistically significant, in order to ensure accurate POD estimation. Therefore in this work
we propose 1) a semi-parametric regression model for flexible modelling of dose response
curves, while accounting for high percentange of zeroes and effects of additional covariates
2) a scalable and computationally efficient MM algorithm for the estimation of regression
parameters, 3) an approach to incorporate monotonicity assumptions, if supported by prior
                                                44


beliefs, within the estimation paradigm, 4) tests of hypothesis for testing the presence of a
dose response trend and whether the trend varies for different cell-types and 5) estimation
of cell-type aware POD’s.
3.1.2       Literature on dose response curve estimation
The primary interest in dose response studies lies in assessing the association between a
continuous predictor, D, and a response variable, Y , while adjusting for covariates, X =
(X1 , . . . Xq )⊤ . There are a number of papers in the existing literature on dose response
curve estimation, but none that solves the problem underlying our motivating examples.
Most of the existing literature focusses on analyzing data from bulk RNA-seq datasets,
microarrays or traditional environmental toxicity studies that focuses on continuous or di-
chotomous endpoints. Let us denote a sample of n independent and identically distributed
(IID) pairs of variables, B = {(Zi , Yi )}, i = 1, . . . n as the data. Zi can be further segmented
as Zi = {(Di , Xi )} where Di represents the dose vector and Xi represents the vector of other
confounding covariates such as age, gender etc. The main interest of dose response studies
lies in modelling E(Yi ) as a function of the dose vector; ie. E(Yi |Di ) = ζ(Di , θ), where θ is
the vector of unknown parameters.
    In general, the methods for estimating dose response can be classified as parametric or
nonparametric. Parametric methods assume a model ζi (D, θ) for the DR curve where θ is
the vector of unknown parameters. Considerable amount of statistical methodology exists
in parametric modelling of dose-response studies, employing parametric models, such as
Logistic, Exponential, and Gompertz as well as others (Holland-Letz and Kopp-Schneider
2015) and the US EPA (US EPA 2012) gives guidelines on how to employ these methods.
These models are generally fit using standard techniques such as maximum likelihood (ML)
or restricted maximum likelihood(RML) and their functional forms are monotonic allowing
for tractable estimation of relevant POD’s. It is well known that if a parametric model
is correctly specified then the ML/RML estimators are efficient. However, in many cases,
                                                 45


it is difficult to correctly specify the parametric form of the dose–response curve because
the biological mechanism of drug action or toxicity may be complex and the form of the
dose–response curve is unknown apriori. When the parametric model is misspecified, the
corresponding curve estimate may be severely biased. In addition fitting of extremely flexible
polynomial models with high orders, to match difficult non-monotonic curve shapes can lead
to severe overfitting. The standard workaround is to fit multiple parametric models, each of
which may fit the data reasonably, but produce a range of POD estimates that accounts for
model uncertainty in the estimation process. Accounting for this uncertainty using model
averaging (MA) has received recent attention in the literature. However, despite simulation
studies suggesting that model-averaged estimates provide POD estimates that have low bias
and nominal coverage properties, these estimates may fail to adequately describe the true
uncertainty for DR curves on the edge of the MA model space, hence resulting in inaccurate
inference. Further, as the MA results are available on limited number of parametric forms,
models that make fewer assumptions on parametric forms may allow for better estimation
of the dose response curves.
    To enhance the robustness of the estimation of the dose–response curve, many nonpara-
metric methods have been proposed. [41] proposed MLE estimation procedures under the
assumption that the dose response curve is sigmoid and non-decreasing. [42] proposed a
method for non-parametric estimation of the dose response curve under monotonicity con-
straints using B-splines regression. [43] used kernel estimators to obtain estimates for the
dose–response curve under general shape restrictions. In the Bayesian setup, several ap-
proaches have been proposed for non-parametric estimation of dose response curves under
monotonicity constraints. From a Bayesian perspective, one specifies a suitable prior on the
regression function that induces monotonicity and then inference is based on the posterior
distribution. [44] employed an additive model with a prior imposed on the slope of the
piecewise-linear functions. [45] adopted mixture modelling of shifted and scaled probability
distribution functions. [46] propose a Gaussian process model with a posterior projection
                                              46


approach for shape-constrained curves. In contrast to the parametric methods, all the non-
parametric approaches are flexible and learn the shape of the dose response curves based
on the data. To our knowledge none of the existing literature has considered building a
semi-parametric regression model for both unconstrained and monotonicity constrained es-
timation of dose response curves for single cell experiments while accounting for technical
confounders and whose development is the primary goal of this chapter.
3.1.3    Literature on MM algorithms
A large number of statistical and machine learning problems require the computation of
                                   min(θ, B) or θ̂ = argmin R(θ, B)                        (3.1)
                                   θ∈Θ                  θ∈Θ
where R(θ, B) is a risk function defined over the observed data B and is dependent on
some parameter θ ∈ ×. Common risk functions that are used in practice are the negative
log-likelihood functions, which can be expressed as
                                                      n
                                                   1X
                                     R(θ, B) = −         f (zi ; yi ; θ)
                                                  n i=1
where f (zi , yi , θ) is a density function over the support of Z and Y . Simple maximum likeli-
hood estimation problems (or alternatively minimization of the negative log-likelihood) can
be solved analytically, but most practical maximum likelihood and least squares estima-
tion problems must be solved numerically. The task of computing 3.1 may be complicated
by various factors which include the lack of differentiability of R or difficulty in obtaining
closed-form solutions to the first-order condition equation ∇θ R = 0, where ∇θ is the gra-
dient operator with respect to θ , and 0 is a zero vector. The double duty MM acronym
that stands for majorize in case of minimization problems and minorize for maximization
problems provides an unifying algorithm for simplifying the computation of a difficult form
of 3.6 via iterative minimization of surrogate functions [90]. Simplification is attained by (a)
avoiding computationally expensive inversion of large matrices, (b) linearizing the optimiza-
                                                   47


tion problem, (c) parameter uncoupling, (d) cleverly dealing with equality and inequality
constraints or (e) turning a non-differentiable surface into a smooth one.
    Let Θ(r) represent a fixed value of the parameter Θ, and let h(Θ|Θ(r) ) denote a real-valued
function of Θ whose form depends on Θ(r) . The function h(Θ|Θ(r) ) is said to majorize a real-
valued function f (Θ) at the point Θ(r) provided
                                      h(Θ|Θ(r) ) ≥ f (Θ) for all Θ
                                   h(Θ(r) |Θ(r) ) = f (Θ(r) )                               (3.2)
Therefore the surface h(Θ|Θ(r) ) lies above the surface f (Θ) and is tangent to it at the point
at Θ = Θ(r) . The function h(Θ(r) |Θ(r) ) is said to minorize f (Θ) at Θ(r) if −h(Θ(r) |Θ(r) )
majorizes −f (Θ) at Θ(r) . Θ(r) represents the current iterate in optimizing the surface
f (Θ). In a majorize-minimize MM algorithm, the algorithm minimizes the majorizing func-
tion h(Θ|Θ(r) ) rather than the actual function f (Θ) . If θ(r+1) denotes the minimizer of
h(Θ(r) |Θ(r) ), then the MM procedure forces f (Θ) towards a minimum. The inequality
                     f (Θ(r+1) ) = h(Θ(r+1) |Θ(r) ) + f (Θ(r+1) ) − h(Θ(r+1) |Θ(r) )
                                 ≤ h(Θ(r) |Θ(r) ) + f (Θ( r)) − h(Θ(r) |Θ(r) )
                                 = f (Θ( r))                                                (3.3)
follows directly from the fact h(Θ(r+1) |Θ(r) ) ≤ h(Θ(r) |Θ(r) )) and definition 3.2. The descent
property 3.3 allows for remarkable numerical stability and the MM algorithm also applies to
maximization rather than minimization with straightforward changes.
    MM algorithms, which present a generalization of the EM (expectation– maximization)
algorithms [91] have been shown to effectively solve a variety of optimization problems in
machine learning, statistical estimation and signal processing. A comprehensive treatment
on the theory and implementation of MM algorithms can be found in [92]. Summaries and
tutorials on MM algorithms for various problems can be found in [93, 90, 94, 95, 92] Some
theoretical analyses of MM algorithms can be found in [96, 97, 98].
                                                   48


3.2         Methods
3.2.1        Model, notations and assumptions
It is understood that scRNA-seq data are collected from many cells of different types when
cells are exposed to different dose levels. Since the data for every gene will be analyzed
separately, let us denote the data from a given gene by {Yi,j , Di,j , Xi,j | i = 1, . . . , I, j =
1, . . . , ni }. Here i and j are used to denote the suffix for the cell-type and a cell within
a specific type of cell. Here Yi,j denotes the response (scRNA-seq expression) from the
jth cell of ith cell type, Di,j and Xi,j denote the dose-level and a set of covariates for the
corresponding cell. Since the scRNA-seq expression contain excessive amount of zeros, we
adopt a zero-inflated negative binomial distribution:
                        Yi,j ∼ ωi,j N B(mi,j = si µi,j , ϕ) + (1 − ωi,j )I(Yi,j = 0).         (3.4)
Model (3.4) implies that the response is assumed to come from a two-component mixture
of two distributions. One component has a mass fully concentrated on zero, and the other
component is the Negative Binomial distribution with the mean of mi,j and the scale of
exp(ϕ), where ϕ is on the real line. We define the mean mij = si,j µi,j where si,j represents
the cell-specific biases and µi,j , expected transcript count. The scale factors si,j are computed
using scran [99], an approach that uses pooling across single cells to normalize scRNA-seq
sequencing data with high number of zeroes. Further µi,j is assumed to be function that
depends on the dose Dj and covariates Xi,j . Specifically, we assume that
                                                                 ⊤
                                      µi,j = exp{ζi (Di,j ) + Xi,j β},                        (3.5)
where ζj is nonparametric function of the dose specific to ith type of cells. Our goal is to
estimate this function ψi . The covariate Xi,j is assumed to exert a linear effect through the
regression parameter β. Our method implicitly assumes that ζi can be well approximated
between boundary points a and b by a cubic B spline basis with some number of knots.
We will denote knot configurations by pairs (K, τ ), where the number of knots K is a
                                                    49


non-negative integer and the knot locations are given by the K dimensional vector k =
(k1 , ...., kK ), for a < d(1) < τ1 ≤ · · · ≤ τK < d(n) < b for k = 1, ..., K denote the k th function
in a cubic B-spline basis with natural boundary constraints, i.e. linear outside [a, b].
     Therefore the dose response curve is approximated as:
                                                            XM
                                                  ζi (D) =       γi,m Bm (D),
                                                            m=1
with M = d + K + 1. With the above model for the dose-response curve, we can write
                                                         XM
                                       µi,j = exp{           γi,m Bm (Di,j ) + Xi,j⊤
                                                                                     β}.                        (3.6)
                                                         m=1
Furthermore, lab experiments indicate that the chance of zero inflation tends to decrease
with the increasing dose level. This is because a lot of genes are induced, thereby increasing
the mean while simultaneously reducing the dropouts. Therefore we propose to model the
inflation parameter as a function of the mean of the negative binomial component (3.4),
which means we model ωi,j in terms of si µi,j
                                              logit(ωi,j ) = ψ0 + ψ1 log(si µi,j ),                             (3.7)
where ψ0 and ψ1 are two unknown parameters to be estimated. Define Θ = (ϕ, β, γ1,1 , . . . ,
γ1,M , . . . , γJ,1 , . . . , γJ,M , ψ0 , ψ1 )⊤ . Then the log-likelihood function is
                      X I X   ni                 
          ℓ(Θ) =                    I(Yi,j > 0) Yi,j log(si µi,j ) − {Yi,j + exp(ϕ)}log{si µi,j + exp(ϕ)}
                       i=1 j=1
                                                                                                          
                      + ϕ exp(ϕ) + log Yi,j         + exp(ϕ) − log(Yi,j !) − log exp(ϕ) + log(ωi,j )
                                                                                   eϕ 
                                                                         exp(ϕ)
                      + I(Yi,j     = 0)log (1 − ωi,j ) + ωi,j                               .
                                                                    si µi,j + exp(ϕ)
The goal is estimating the parameter by maximizing ℓ(Θ). The maximization of ℓ(Θ) is
difficult as the dimension of Θ could be large. Therefore, we develop an MM algorithm to
ease the complexity of this optimization. The first step of the MM algorithm is developing
a minorization function that is easy to optimize compared to ℓ(Θ) and it is presented in the
                                                               (0)          (0)       (0)      (0)    (0)   (0)
following theorem. Define Θ(0) = (ϕ(0) , β (0) , γ1,1 , . . . , γ1,M , · · · , γJ,1 , . . . , γJ,M , ψ0 , ψ1 )⊤ .
                                                              50


Theorem 1. The minorizing function of ℓ(Θ) is ℓ† (Θ|Θ(0) ), such that
            ℓ† (Θ|Θ(0) ) = ℓ†1 (ϕ|Θ(0) ) + g1 (ψ0 |Θ(0) ) + g2 (ψ1 |Θ(0) ) +                    †
                                                                                       PI                   (0)
                                                                                           ι=1 ℓ3,ι (γι |Θ )
                                                        +ℓ†4 (β|Θ(0) ) + ℓ†5 (Θ(0) ),                             (3.8)
ℓ(Θ) ≥ ℓ† (Θ|Θ(0) ) with equality holding when Θ = Θ(0) , and the terms of (3.8) are
                         XI X   ni                                                         
      †         (0)                                                                      (0)
     ℓ1 (ϕ|Θ ) =                        I(Yi,j > 0){Yi,j + exp(ϕ)} + exp(ϕ)Γi,j
                         i=1 j=1
                    (                               )
                        exp(ϕ(0) ) − exp(ϕ)              n
                                                                               (0)
                                                                                   o
                              (0)
                                                      + I(Yi,j > 0) + Γi,j
                         si µi,j + exp(ϕ(0) )
                        
                                                    0.5sj
                    × ϕ exp(ϕ) −                (0)
                                                                    {exp(ϕ) − exp(ϕ(0 )}2
                                           si µi,j + exp(ϕ )   (0)
                                                                   
                                             (0)            (0)
                    − exp(ϕ)log{si µi,j + exp(ϕ )}
                                                                                   
                    + I(Yi,j > 0) log Yi,j + exp(ϕ) − log exp(ϕ)                        ,
                          I X   ni n                                        (0)
                                                                            ωi,j
                         X                                      o                                                 
                (0)                                         (0)                                      (0)
   g1 (ψ0 |Θ ) =                       I(Yi,j > 0) + Γi,j ψ0 −                      exp{(ψ0 − ψ0 )(2M + 5)}
                         i=1 j=1
                                                                          2M + 5
                         XI X   ni n                            o
                                                             (0)
   g2 (ψ1 |Θ ) =(0)
                                        I(Yi,j > 0) + Γi,j           ψ1 log(si ) − 0.5(M + 1)(ψ1 − ψ (0) )2
                         i=1 j=1
                              d+K+1
                                X                                        
                                         (0)                    ⊤ (0)
                    + ψ1 {             γi,m Bm (Di,j )   +   Xi,j  β }
                                m=1
                              (0) 
                            ωi,j                                  (0)
                    −                  exp{log(si )(ψ1 − ψ1 )(2M + 5)}
                         2M + 5
                                                           (0)
                    + 0.5(M + 1) exp{(ψ1 − ψ1 )2 }
                                                XM                                                
                                           (0)          (0)                   ⊤ (0)
                    + exp (ψ1 − ψ1 ){                 γi,m Bm (Di,j ) + Xi,j β }(2M + 5)                  ,
                                                 m=1
                            Xni n                            o
        ℓ†3,i (γi |Θ0 )
                                                          (0)
                        =             I(Yi,j > 0) + Γi,j
                            j=1
                                    XM                                           XM                            
                                                      (0)                     (0)                 (0)
                        × −0.5            (γi,m −    γi,m )2 Bm 2
                                                                  (Di,j ) + ψ1        (γi,m −   γi,m )Bm (Di,j )
                                     m=1                                          m=1
                                  (0)           M
                               ωi,j            X                        (0)
                        −                 0.5       exp[{(γi,m − γi,m )Bm (Di,j )(2M + 5)}2 ]
                            2M + 5             m=1
                                                                51


                     XM                                                                
                                                (0)      (0)
                 +          exp{(γi,m −       γi,m )ψ1 Bm (Di,j )(2M            + 5)}
                    m=1
                    X ni                          X M
                 +            I(Yi,j > 0)Yi,j             γi,m Bm (Di,j )
                     j=1                           m=1
                                                
                                 sj
                 −         (0)
                                                   I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                     si µi,j   +  exp(ϕ(0) )
                                          
                       (0)          (0)
                 +  Γi,j    exp(ϕ )
                          (0)    M
                       µi,j X                                 (0)
                 ×                    exp{(γi,m − γi,m )Bm (Di,j )(M + 1)}
                     M + 1 m=1
                               0.5sj             n
                                                                          (0)
                                                                              o
                 −         (0)
                                                    I(Y   i,j  >  0) +  Γ i,j
                     si µi,j + exp(ϕ(0) )
                                     XM
                               (0) 2                       (0)
                 × −2(µi,j )                (γi,m − γi,m )Bm (Di,j )
                                      m=1
                         (0)     M
                     (µi,j )2   X                                                           
                                                                          (0)
                 +                    exp{2(M + 1)(γi,m −                γi,m )Bm (Di,j )}      ,
                     M + 1 m=1
              X I X   ni n                                o
ℓ†4 (β|Θ0 )
                                                       (0)                 ⊤
            =                  I(Yi,j > 0) +        Γi,j        −0.5{Xi,j     (β − β (0) )}2
               i=1 j=1
                                          
                (0) ⊤               (0)
            + ψ1 Xi,j     (β   −β       )
                   (0) 
                 ωi,j                         ⊤
            −                 0.5 exp[{Xi,j      (β − β (0) )(2M + 5)}2 ]
              2M + 5
                                                                  
                        ⊤ (0)               (0)
            + exp{Xi,j ψ1 (β − β )(2M + 5)}
              X I X  ni 
                                                      ⊤
            +                I(Yi,j > 0)Yi,j Xi,j       β
              i=1 j=1
                                                                                                             
                           sj                                                                (0    (0)    (0)
            −      (0)
                                             I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ ) + Γi,j exp(ϕ )
              si µi,j + exp(ϕ(0) )
                  (0)
                 µi,j               ⊤
            ×             exp{Xi,j    (β − β (0) )(M + 1)}
              M +1
                        0.5sj              n
                                                                    (0)
                                                                        o
            −      (0)
                                             I(Y  i,j  >   0)   + Γ i,j
              si µi,j + exp(ϕ(0) )
                                                              (0)
                                                          (µi,j )2
                                                                                                      
                        (0) 2 ⊤                (0)                                        ⊤       (0)
            × −2(µi,j ) Xi,j (β − β ) +                             exp{2(M + 1)Xi,j (β − β )} ,
                                                         M +1
                                                              52


           (0)                 (0)             (0)                (0)       (0)             (0)
where, Γi,j = I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )/{(1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )},
  (0)            (0)    (0)        (0)                        (0)       (0)         (0)
ωi,j = exp{ψ0 + ψ1 log(si µi,j )}/[1 + exp{ψ0 + ψ1 log(si µi,j )}], and
                                                   ϕ
G(ϕ, µi,j ) = [exp(ϕ)/{sj µi,j + exp(ϕ)}]e .
     The noticeable fact is that the minorizing function separates all the parameters. Conse-
quently, the follow-up calculation becomes extremely simple compared to the case of non-
separated parameters in the joint update of all the parameters in any iterative procedure.
Next, parameters are estimated by the Newton-Raphson method. This technique of updat-
ing the parameters is known as gradient MM algorithm [93]. Let Θ(t) and Θ(t+1) be the
parameter value at the tth and (t + 1)th iterations, respectively, then in the gradient MM
                                 n 2†                         o−1 †
                                        (Θ|Θ(t) )                   ∂ℓ (Θ|Θ(t) )
approach, Θ(t+1) = Θ(t) − ∂ ℓ∂Θ∂Θ           ⊤      |Θ=Θ   (t)
                                                                         ∂Θ
                                                                                  |Θ=Θ(t) . Starting with an initial
value of Θ, this update is repeated until the parameters converge with a specified tolerance.
Note that in the above update, instead of the log-likelihood, we are using the minorizing
function. Because of the separation of the parameters, {∂ 2 ℓ† (Θ|Θ(t) )/∂Θ∂Θ⊤ } turns out to
be diagonal matrix. Specifically,
                                    I X   ni 
       ∂ℓ† (Θ|Θ(0) )                                      exp(ϕ(0) )
                                   X                                         
                       |Θ=Θ(0) =                − (0)                           Yi,j I(Yi,j > 0) + exp(ϕ(0) )
               ∂ϕ                  i=1 j=1          si µi,j + exp(ϕ(0) )
                                   n                            o
                                                            (0)
                               × I(Yi,j > 0) + Γi,j
                                                  
                                                       ∂
                               + I(Yi,j > 0)              log Yi,j + exp(ϕ) |ϕ=ϕ(0)
                                                     ∂ϕ
                                                                   
                                    ∂
                               −       log exp(ϕ) |ϕ=ϕ(0)
                                   ∂ϕ
                                   n                           o
                                                           (0)
                               + I(Yi, > 0) + Γi,j                 exp(ϕ(0) ) + exp(ϕ(0) )ϕ(0)
                                                     n                            o 
                                           (0)                (0)           (0)
                               − exp(ϕ )log sj µi,j + exp(ϕ )                           ,
                                 I X  ni n
      ∂ℓ† (Θ|Θ(0) )            X                                    (0)     (0)
                                                                                 o
                     |Θ=Θ(0) =              I(Yi,j > 0) + Γi,j − ωi,j ,
           ∂ψ0                  i=1 j=1
                                 I X  ni                  d+K+1
      ∂ℓ† (Θ|Θ(0) )            X                              X (0)                                
                     |Θ=Θ(0) =              log(si ) +               γi,m Bm (Di,j ) + Xi,j β⊤ (0)
           ∂ψ1                  i=1 j=1                       m=1
                               n                                      o
                                                        (0)       (0)
                                  I(Yi,j > 0) + Γi,j − ωi,j ,
                                                            53


                                 I X  ni
     ∂ℓ† (Θ|Θ(0) )             X                         n                                       o
                                                      (0)                          (0)        (0)
                   |Θ=Θ(0) =              Xi,j ψ1 I(Yi,j > 0) + Γi,j − ωi,j + I(Yi,j > 0)Yi,j
           ∂β                  i=1 j=1
                                                  (0)
                                            sj µi,j
                                                               
                               −       (0)
                                                                  I(Yi,j > 0)Yi,j
                                   si µi,j + exp(ϕ(0) )
                                                                                          
                                                              (0         (0)         (0)
                               + I(Yi,j > 0) exp(ϕ ) + Γi,j exp(ϕ ) ,
                                ni
     ∂ℓ† (Θ|Θ(0) )             X                           n                                       o
                                                        (0)                          (0)        (0)
                   |Θ=Θ(0) =        Bm (Di,j ) ψ1 I(Yi,j > 0) + Γi,j − ωi,j + I(Yi,j > 0)Yi,j
         ∂γi,m                 j=1
                                                  (0)
                                            sj µi,j
                                                               
                               −       (0)
                                                                  I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                                   si µi,j + exp(ϕ(0) )
                                                        
                                    (0)            (0)
                               + Γi,j exp(ϕ ) .
Also,
                                        I X   ni 
        ∂2 †                                                                   Yi,j exp(ϕ(0) )
                                      X                               
                   (0)
            ℓ  (Θ|Θ    ) |Θ=Θ (0) =                   I(Y i,j  >   0)    −      (0)
       ∂ϕ2                            i=1 j=1                               si µi,j + exp(ϕ(0) )
                                       ∂2                                                ∂2
                                                                                                           
                                  + 2 log Yi,j + exp(ϕ) |Θ=Θ(0) − 2 log exp(ϕ) |Θ=Θ(0)
                                      ∂ϕ                                                ∂ϕ
                                      n                            o
                                                               (0)
                                  + I(Yi, > 0) + Γi,j                   2 exp(ϕ(0) ) + exp(ϕ(0) )ϕ(0)
                                                          n                             o
                                                                  (0)
                                  − exp(ϕ(0) )log sj µi,j + exp(ϕ(0) )
                                      3 exp(2ϕ(0) ) + sj exp(2ϕ(0) )
                                                                                 
                                  −                 (0)
                                                                                      ,
                                              sj µi,j + exp(ϕ(0) )
                                                        I     n i
         ∂2 †                                          XX            (0)
            2
              ℓ (Θ|Θ0 ) |Θ=Θ(0) = (2M + 5)                         ωi,j ,
        ∂ψ0                                            i=1 j=1
                                        I     ni
       ∂2 †                           XX                                                         
                   (0)                                                                         (0)
            ℓ (Θ|Θ ) |Θ=Θ(0) =                        −(M + 1) I(Yi,j > 0) + Γi,j
      ∂ψ12                            i=1 j=1
                                           (0) 
                                         ωi,j
                                  −                   (2M + 5)2 × {log(si )}2 + M + 1
                                      2M + 5
                                       X M                                                             
                                                 (0)                       ⊤ (0) 2                   2
                                  +{          γi,m Bm (Di,j ) + Xi,j β } (2M + 5)                        ,
                                       m=1
         2                                 ni                 
       ∂ †         (0)
                                          X
                                                    2                                     (0)
         2
            ℓ (Θ|Θ ) |Θ=Θ(0) = −                Br (Di,j ) I(Yi,j > 0) + Γi,j + (2M + 5)
      ∂γi,r                               j=1
                                                              54


                                                       
                                          (0),2    (0)
                               × (1 +   ψ1 )ωi,j
                                        (0)
                                   sj µi,j (M + 1)
                                                         
                               +     (0)
                                                           I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                                 si µi,j + exp(ϕ(0) )
                                                    
                                  (0)          (0)
                               + Γi,j exp(ϕ )
                                        n                       o 
                                    (0)                     (0)
                               + 2µi,j    I(Yi,j > 0) +    Γi,j     ,
A detailed proof of the theorem is provided in the Appendix for Chapter 3.
3.2.2     Penalized Estimation
In our proposed spline model, regression is performed by choosing a set of knots and
by finding the spline defined over these knots. In this case, the number of knots has
an important influence on the resulting fit; not having enough knots leads to underfit-
ted regression and too many knots lead to overfitted model. Choosing the position of
knots is also an issue since uniformly distributed knots can lead to overfitting in an area
where there are few points and underfitting in an area where there are many points. The
most widely used spline regression methods overcome this difficulties by using a penal-
ization approach. In smoothing splines, knots are set at each data point and the wig-
gliness of the spline is controlled by penalizing over its integrated squared second order
               R ′′ 2
derivative ψ (x) dx. Substantial statistical methodology exists in the literature for find-
ing the best number and location of knots; we refer to [100] for a review. To this effect,
one can choose a large number of equally spaced initial knots and estimate the model by
maximizing a penalized likelihood function. We follow [101] Chapter 5.5.3, and set for
k = 1, . . . , K, τk = k+1
                       K+2
                           th sample quantile of the unique dose levels. The default choice for
K = min(0.25 × the number of unique dose levels, 35). This knot placement procedure has
been shown to peform well in a lot of scenarios. Alternatively the statistician may choose
K based on visual inspection of the data scatterplot to better depict the complexity of the
regression function relative to the noise in the data. The penalized log-likelihood function
                                                      55


is ℓp (Θ) = ℓ(Θ) −       λi P(γi ), where λi > 0 is the smoothing parameter and P(γi ) denotes
                     P
                       i
the penalty function for the ith dose-response curve. The minorizing function for ℓp (Θ) is
simply ℓ† (Θ|Θ(0) ) − i λi P(γi ). This will lead to a minor adjustment to the expressions
                       P
when taking derivatives of the minorizing function with respect to γi . Procedures such as
cross validation and generalized cross validation are used for the selection of the smoothness
parameter λ ([101] Chapter 5)
3.2.3    Monotonicity constraints
Our primary focus is inference on the unknown regression function log(µi ). Equation (1)
works fine when we have no prior information on the shape of the regression function. How-
ever, a high percentage of genes are expected to have a higher chance of an adverse response
at increasing dose levels, after adjusting for important confounding factors, such as age and
sex. Additionally, incorporating monotonicity constraints while assessing the dose response
function has been shown to improve estimation efficiency and power to detect trends [102].
To impose monotonicity constraints on Equation (1), we need to assume that ψi ∈ Θ+ is an
isotonic regression function with Θ+ = {ζi (d1 ) ≤ ζi (d2 )∀(d1 , d2 ) ∈ R2 , d1 < d2 }. Currently
we express the log-mean function as log(µi,j ) = M                     m=1 γi,m Bm (Di,j ) + Xi,j β. To make this
                                                                                                     ⊤
                                                                   P
function monotone increasing with respect to dose d, we need γi,m > 0, for m = 1, . . . , M
[103]. Therefore we write
                                  γi,m = exp(ηi,m )
                                              X M
                             log(µi,j ) =             exp(ηi,m )Bm (Di,j ) + Xi,j         ⊤
                                                                                            β
                                              m=1
According to theorem 5.9 of [104], ζi (D) =                          γi,m Bm (D), is a class of monotone nonde-
                                                         PM
                                                             m=1
creasing splines, since the monotonicity of the B-splines is guaranteed by the nondecreasing
order of co- efficients.
    Thus our goal now is estimation of parameters
                                        (0)            (0)            (0)            (0)    (0)  (0)
                   Θ = (ϕ(0) , β (0) , η1,1 , . . . , η1,M , . . . , ηJ,1 , . . . , ηJ,M , ψ0 , ψ1 )⊤
                                                           56


Accounting for this parameter transformation in the minorization function, only changes the
first and second derivative calculation with respect to γi,m for m = 2, . . . , M . Now, the first
derivative of the minorization function with respect to ηi,m will be
                                                                                                     
       ∂                              ∂                                     ∂ ‡
           g5,i (ηi |Θ(0) ) =             g3,i (exp(ηi )|Θ ) +  (0)                               (0)
                                                                                 ℓ (exp(ηi )|Θ ) × exp(ηi,m )
     ∂ηi,m                          ∂γi,m                                 ∂γi,m 3,i
Define
                                               ∂2                  (0)        ∂2 ‡
                           Hm,m′ (γi,m ) =      2
                                                    g 3,i (γ i |Θ      ) +         ℓ (γ |Θ(0) )
                                                                                2 3,i i
                                            ∂γi,m                           ∂γi,r
Then the second derivates with respect to ηi,m can be written as
                         Hm,m′ (ηi,m ) = Hm,m′ (exp(ηi,m )) × exp(ηm ) × exp(ηm′ )
                                            ∂                                         ′
                                       +         g5,i (ηi |Θ(0) ) × I(m = m )
                                          ∂ηi,m
The rest of the computations remain exactly the same as before.
3.2.4    Confidence interval estimation
The gradient of the penalized log-likelihood function is
                                                                                   I    ni
                            ∂ℓp (Θ)     ∂ℓ(Θ) X                  ∂               XX
                                     =           −         λi        P(γi ) =              Ui,j ,
                              ∂Θ         ∂Θ           i
                                                               ∂Θ                 i=1 j=1
where Ui,j = {∂ℓi,j (Θ)/∂Θ − (λi /ni ) ∂P(γi )/∂Θ} |Θ=Θb .
Let H = ∂ 2 ℓp (Θ)/∂Θ∂Θ⊤ |Θ=Θb , and define A = −(H/                                     ni )−1 . Then the variance-
                                                                                 PI
                                                                                     i=1
                  qP
covariance of           i=1 ni Θ can be estimated by
                        I      b
                                                             I X    ni
                                                                                   !
                                                   1       X
                                                                               ⊤
                                  V0 = A      PI                         Ui,j Ui,j    A⊤
                                                  i=1 ni    i=1 j=1
. The square root of the diagonal elements of V0 divided by the square root of the total
sample size, i ni , is the standard error of the estimator of each component of the Θ vector.
                P
An important component of the model is estimation of dose-response relation for a gene in
a specific cell type. Suppose the interest is in the ith cell type, so the dose-response relation
                                                           57


estimator for the ith cell type is G
                                   ci (D∗ ) = PM Bm (D∗ )b
                                                   m=1             γi,m at the dose level D∗ . We can
re-write G              b where
         ci (D∗ ) = L⊤ Θ,
                         L⊤ = (0, 0, 0, 0, B1 (D∗ ), . . . , BM (D∗ ), 0, . . . , 0)
Therefore,
                                                                            
                                                                    V0
                          var(G
                              ci (D)) = var(L Θ) = L
                                                ⊤b            ⊤
                                                                  PI            L.
                                                                    i=1 ni
                                                                     q
Then the 95% pointwise CI for Gi (D ) is G
                                         ∗     bi (D∗ ) ± 1.96 ×        var(G    bi (D∗ )).
3.2.5    Model Selection
In case, the user is not sure whether placing monotonicity constraints betters the model
estimation, we suggest fitting both the unconstrained and constrained versions of the models
and selecting the model with the lowest AIC where AIC = 2θ − 2log(l(Θ)),                 b where θ is the
number of estimated parameters and log(l(Θ))    b is maximum value of the likelihood function
for the model . In case of penalized estimation, the value of the optimal λ can be chosen
using cross-validation and then the value of the log-likelihood can be calculated using the
MLE estimates obtained via MM algorithm for penalized estimation.
3.3    Hypothesis Testing
There are two tests of hypothesis that we wish to conduct for the model described above:
   1. Test if there is any effect of dose on the response for a given cell-type. Suppose that
      we are interested in the ith cell type, then set H0,1 : γi,2 = γi,3 = . . . γi,M versus the
      alternative Ha,1 : at least one of the components of γi is different from the rest of the
      components.
   2. To test if the dose-response curves differ by the cell-types we set H0,2 : γ1 = · · · = γI
      versus the alternative Ha,2 : at least one of γi is different from the rest.
                                                  58


For both tests, we use likelihood ratio test. The likelihood ratio test statistic is given by
                                                     n                      o
                                           Λψ = 2 ℓ(Θ)     b − ℓ(Θ    b H0 )
where Θ̂, calculated via the MM algorithm, denotes the unconstrained MLEs of Θ while Θ̂H0
is the constrained MLE of Θ under H0 , which can be obtained by
                                (                  (t)
                                                                        )−1                 (t)
                                     2 †
                (t+1)    (t)       ∂   ℓ  (ΘH 0 |Θ H0  )                      ∂ℓ† (ΘH0 |ΘH0 )
             ΘH0 = ΘH0 −                                  |ΘH =Θ(t)                               |ΘH =Θ(t)
                                      ∂ΘH0 ∂Θ⊤    H0           0    H0             ∂ΘH0                0    H0
where
                                                                                            X I
        ℓ† (ΘH0 |ΘH0 ) = ℓ†1 (ϕH0 |ΘH0 ) + g1 (ψ0 H0 |ΘH0 ) + g2 (ψ1 H0 |ΘH0 ) +                 ℓ†3,i (γi H0 |ΘH0 )
                   (0)                (0)                     (0)                  (0)                          (0)
                                                                                             i=1
                       + ℓ†4 (βH0 |ΘH0 ) + ℓ†5 (ΘH0 )
                                     (0)            (0)
    For problem 1), the null model, ψi (D) does not vary with D, so is a constant. Therefore,
µi,j = exp(γ ∗ + Xi,j  T
                         β). Under the null hypothesis H0,1 , Λψ is asymptotically follows a
chi-squared distribution with degrees of freedom (M − 1). Under the null hypothesis H0,2 ,
γ1 = · · · = γI = γ (say), so Λψ is asymptotically follows a chi-squared distribution with
degrees of freedom (I − 1)M .
3.4      Results
3.4.1      Simulation Design
To study the behavior of the procedure, we conducted a simulation study. We generated the
                                                                  (s)
log-mean dose response curve as; log(µij ) = ζi (Dij ) + βXij ,where Di,j = {0, 0.01, 0.03, 0.1,
                           (s)    log(Di,j +1)                                                   (s)
0.3, 1, 3, 10, 30} and Dij = log(30)            . The dose response function ζi (Dij ) is then assumed
to follow four different patterns
    • Case 1:
                                 (s)                   (s)            (     2        (s)
                         ζ1,1 (D1,j ) = 1 + 0.1D1,j + 0.1D1,j s) + 3(D1,j − 0.5)2+
                                 (s)                   (s)            (     2           (s)
                         ζ1,2 (D2,j ) = 1 + 0.1D1,j + 0.8D1,j s) + 0.8(D1,j − 0.5)2+
                                 (s)                      (s)
                         ζ1,3 (D3,j ) = 0.5 + 0.2D1,j
                                                           59


    • Case 2:
                                    (s)                       (s)
                            ζ2,1 (D1,j ) = 0.15 exp{2 × D1,j }
                                    (s)                                     (s)
                            ζ2,2 (D2,j ) = 0.15 exp[{2 + (0.5/300)} × D2,j ]
                                    (s)                                     (s)
                            ζ2,3 (D3,j ) = 0.15 exp[{2 + (0.6/300)} × D3,j ]
    • Case 3:
                                                       (s)
                                   (s)       1.5sin(Di,j )
                             ζi (Dij )  =                 (s)
                                                               + 2, i = 1, 2, 3
                                          1 + 1.5sin(Di,j )
    • Case 4:
                                              (s)
                                         ζi (Dij ) = 3, i = 1, 2, 3
We generated a single covariate Xi,j ∼ N (0, 1) and set β = 0.5. The B-spline basis matrix
Bm (Di,j ) is then generated using three knots at 0.008, 0.075 and 0.4. We also fixed the
degree of the B-spline basis as d = 3. We chose ψ0 = 0.5, ψ1 = 0.1 and computed ωi,j
using Equation 3.7. The binary random variable Ri,j is then generated from a Bernoulli
                                                                                   ′
distribution with mean ωi,j . Finally expression data Yij is generated as Yi,j = Yi,j Ri,j where,
  ′
Yij is generated from a negative binomial distribution with mean si,j µi,j , and dispersion
exp(ϕ) where ϕ ∼ N (0.5, 0.1). Datasets with three cell types are generated. The analysis is
repeated with cell-type specific sample sizes ni = 100, 300.
3.4.2     Analysis
For each dose response pattern and for ni = (100, 300) we simulated 500 data sets. Tables
3.1, 3.2, 3.3 and 3.4 summarizes the estimated mean, SE, bias, Mean square error and
coverage probabilities of parameters β, ϕ, ψ0 and ψ1 for cases 1-4. The results are compared
with an intercept model having ψ1 = 0. It can be seen in general that the performance of
the full model is better than the intercept model in all aspects, thus indicating a dependence
of the zero-inflation parameter on the log-mean. Figures 3.1,3.3,3.5,3.6,3.9,4.0,4.3 and 4.4
                                                  60


demonstrate the fits of the estimated curve along the 95% confidence interval. From the
figures, it can be seen that the fitted values for both sample sizes are close to the true values,
and the 95% confidence bands in general covers the true curves. As expected, the confidence
intervals are wider when sample size decreases.
    To further demonstrate the performance of our proposed ZINB spline model (ZINB-SPL)
we benchmarked it against three competing methods; (a) Zero inflated negative binomial gen-
eralized additive model (ZINB-GAM-Dose) with the zero inflation parameter dependent on
the dose as logit(ωij ) = γ0 + γ1 ∗ dose, (b) Zero inflated negative binomial generalized ad-
ditive model (ZINB-GAM-Int) with the zero inflation parameter having an intercept model;
logit(ωij ) = γ0 , and (c) Negative Binomial GAM. We use R packages mgcv [105] for fitting
NB GAM and zigam [106] for fitting models (a) and (b). The benchmarking curves for
the simulation scenarios are reported in Figures 3.3, 3.7,3.11 and 3.15. The performance of
the four models are very close in case of the linear and constant models demonstrated in
simulation scenarios 3 and 4. However ZINB-SPL leads to almost perfect curve estimates in
non-linear curves shapes illustrated in simulation scenarios 1 and 2. This is further demon-
strated by the RMSE boxplots in Figures 3.4, 3.8,3.12 and 3.16. The RMSE boxplots in
Figure 3.4 and 3.8 for the ZINB-SPL lies significantly lower than the other 3 models. Also
note that the NB-GAM model has the worst performance in scenarios 1 and 2, hence justi-
fying the need of a zero-inflation structure to model the excess zeroes in the data generating
process.
    Under Case 1 and Case 2, we had 500 positive control datasets where true mean curve
varies for the three cell-types. Therefore Case 1 and Case 2 were used to to study the power
of Test 2 in detecting true positive instances, (Section 3.3) . In this scenario, it is expected
that the the LRT test statistic will have large values, with the expected p-value close to
0. Therefore the expected true positive rate is computed as TPR = 500         1
                                                                                 P500
                                                                                   l=1 I[P (Λζ,l >
χ2M (I−1) |Ha ) < α]. Case 2 embodies a local alternative testing scenario where the slope
parameter of the DR functions for cell type 2 and 3 are written as α2 = α1 + g1 /nk and
                                                 61


α3 = α1 + g2 /nk , where α1 is the slope parameter of the DR function for cell-type 1 and
g1 = 0.5 and g2 = 0.6 are sufficiently close. Therefore it is expected that the power of test
2 in detecting true positives for Case 1 will be much smaller than the expected TPR in case
1, where there is larger discrepancy in the shape of the true mean functions. In Case1, out
of 500 simulations, in 490 instances the null hypothesis was rejected with an expected TPR
of 0.98 for ni = 300. For ni = 300, 476 instances were rejected the null hypothesis with an
expected TPR of 0.952. For Case 2, ni = 300, 298 out of 500 instances yielded a rejection
of the null hypothesis with an expected TPR of 0.578 and for ni = 100, 196 out of 500
simulations yielded a rejection of the null hypothesis with an expected TPR of 0.392. The
decrease in TPR for smaller sample sizes and local alternatives is an expected result.
                                                                     (s)
    Next, under Case 3 and 4, the true log-mean response ηi (Dij ) followed the same DR
model for all the cell types. The only difference is that in Case 3, all celltypes has a mono-
tonically increasing DR function, whereas in Case 4, the DR function is a constant. There-
fore the datasets generated from these cases serve as negative control datasets, where the
true dose response curves does not change with the cell-types. We conducted Test 2 on
these datasets to study the ability of the test in controlling for false positives. Since all the
datasets in this scenario are generated from a null model, the average p-value should be close
to 1 and we determine false positive rate as FPR = 500  1
                                                           P500               2
                                                             l=1 I[P (Λη,l > χ(M )(I−1) |H0 ) < α],
where α = 0.05. For Case 3, ni = 300, 39 out of 500 simulations yielded a rejection of the
null hypothesis with expected FPR = 0.076 . For ni = 100, 54 instances rejected the null
hypothesis with an expected FPR of 0.108. In Case 4, ni = 300, 22 out of 500 instances
are rejected, thus having an expected FPR of 0.044. For ni = 100, 76 out of 500 instances
are rejected at a level of 0.05, with an expected FPR of 0.152. This clearly indicates that a
higher sample size is preferable for controlling the false positive rates.
                                               62


                              ni = 300                                ni = 100
   Scenario   Model      β̂      SE       Bias    MSE    Cov      β̂     SE     Bias MSE Cov
    Case 1     Full    0.507    0.049    0.007    0.003 0.95 0.491 0.083 -0.008 0.008 0.92
            Intercept  0.515    0.049    0.015    0.002 0.94 0.500 0.0824 0.007 0.007 0.93
    Case 2     Full    0.498    0.112    -0.002 0.004 0.96 0.502 0.123          0.02 0.015 0.97
            Intercept  0.512    0.045    0.012    0.003 0.998 0.5079 0.0795 -0.007 0.011 0.992
    Case 3     Full    0.500    0.037   -0.0015 0.0014 0.942 0.499 0.066 -0.001 0.004 0.938
            Intercept  0.500    0.037    0.001    0.001 0.938 0.498 0.067 0.002 0.004 0.965
    Case 4     Full    0.495    0.031    -0.004 0.001 0.952 0.504 0.060 0.004 0.003 0.964
            Intercept  0.501    0.033    0.001    0.001 0.931 0.503 0.060 0.003 0.003 0.964
Table 3.1 Parameter estimate, asymptotic standard error, bias and mean squared
error(MSE)of parameter β under full model and the intercept model. The results are
averaged over 500 replicates and reported for ni = 100, 300
                              ni = 300                                ni = 100
   Scenario   Model        ϕ̂      SE       Bias   MSE    Cov     ϕ̂    SE     Bias  MSE    Cov
    Case 1      Full    0.618     0.168     0.118 0.043  0.902  0.891  0.303 -0.008 0.008   0.74
             Intercept  0.601     0.171     0.101 0.041  0.904  0.888  0.308 0.389 0.263   0.75
    Case 2      Full    0.621     0.243     0.221 0.127  0.856  1.283  0.325 0.7836 1.136  0.532
             Intercept  0.687     0.264     0.187 0.119  0.776  1.317  0.314 0.817 1.158   0.525
    Case 3      Full    0.554     0.089     0.054 0.011  0.922  0.684  0.155 0.185 0.080   0.764
             Intercept  0.555    0.0907     0.055 0.011  0.901  0.682  0.153 0.182 0.159   0.752
    Case 4      Full    0.529     0.070     0.029 0.006  0.851  0.625  0.120 0.126 0.032   0.796
             Intercept  0.540     0.069     0.040 0.007  0.892  0.635  0.120 0.135 0.034   0.762
Table 3.2 Parameter estimate, asymptotic standard error, bias and mean squared
error(MSE)of parameter ϕ under full model and the intercept model. The results are
averaged over 500 replicates and reported for ni = 100, 300
                              ni = 300                                ni = 100
   Scenario   Model       ˆ
                         ψ0        SE      Bias    MSE     Cov      ˆ
                                                                   ψ0     SE    Bias MSE Cov
    Case 1     Full     0.512    0.166     0.012   0.050  0.97 0.488 0.427 -0.012 0.173 0.978
            Intercept   0.619    0.268     0.119   0.029    1    0.609 0.480 0.109 0.064      1
    Case 2     Full     0.455    0.223    -0.045   0.069  0.96 0.044 0.592 0.101 0.229 0.91
            Intercept   0.552    0.206     0.052   0.048  0.988 0.5218 0.383 0.021 0.146 0.976
    Case 3     Full    0.4744    0.353    -0.025  0.0267    1    0.431 0.598 -0.068 0.190 0.972
            Intercept   0.725    0.352    0.0554   0.011    1    0.728 0.602 0.227 0.072      1
    Case 4     Full     0.482    0.462    -0.018   0.016 0.961 0.468 0.795 -0.031 0.120       1
            Intercept   0.796    0.502     0.296   0.093    1    0.797 0.788 0.136 0.034      1
Table 3.3 Parameter estimate, asymptotic standard error, bias and mean squared
error(MSE)of parameter ψ0 under full model and the intercept model. The results are
averaged over 500 replicates and reported for ni = 100, 300
                                                   63


Figure 3.1 Results of the simulation study, illustrating the performance of Model 1 (see
Simulation Design) in 500 replicates with 300 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
Figure 3.2 Results of the simulation study, illustrating the performance of Model 1 (see
Simulation Design) in 500 replicates with 100 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
                                              64


Figure 3.3 Results of the simulation study, illustrating the performance of Model 1 (see
Simulation Design) in 500 replicates for sample size 300. The columns correspond to the
three different cell-types. Continuous cyan, peach, darkblue, darkgreen and red lines
represent the represent the true curve and the estimated log-mean curves averaged across
500 random replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-Int and NB-GAM
models respectively
Figure 3.4 Results of the simulation study, illustrating the estimated RMSE of Model 1
(see Simulation Design) in 500 replicates for sample size 300. The columns correspond to
RMSE boxplots of the NB-GAM, ZINB-GAM-Dose, ZINB-GAM-Int and ZINB-SPL
models respectively for the three different cell-types plotted over 500 replicates.
                                              65


Figure 3.5 Results of the simulation study, illustrating the performance of Model 2 (see
Simulation Design) in 500 replicates for 300 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
Figure 3.6 Results of the simulation study, illustrating the performance of Model 2 (see
Simulation Design) in 500 replicates for 100 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
                                              66


Figure 3.7 Results of the simulation study, illustrating the performance of Model 2 (see
Simulation Design) in 500 replicates for sample size 300. The columns correspond to the
three different cell-types. Continuous cyan, peach, darkblue, darkgreen and red lines
represent the represent the true simulated curves and the estimated log-mean curve
averaged across 500 random replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-Int
and NB-GAM models respectively
Figure 3.8 Results of the simulation study, illustrating the estimated RMSE of Model 2
(see Simulation Design) in 500 replicates for sample size 300. The columns correspond to
RMSE boxplots of the NB-GAM, ZINB-GAM-Dose, ZINB-GAM-Int and ZINB-SPL
models respectively for the three different cell-types plotted over 500 replicates.
                                              67


Figure 3.9 Results of the simulation study, illustrating the performance of Model 3 (see
Simulation Design) in 500 replicates for 300 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
Figure 3.10 Results of the simulation study, illustrating the performance of Model 3 (see
Simulation Design) in 500 replicates for 100 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
                                              68


Figure 3.11 Results of the simulation study, illustrating the performance of Model 3(see
Simulation Design) in 500 replicates for sample size 300. The columns correspond to the
three different cell-types. Continuous cyan, peach, darkblue, darkgreen and red lines
represent the represent the true simulated curves and the estimated log-mean curve
averaged across 500 random replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-Int
and NB-GAM models respectively
Figure 3.12 Results of the simulation study, illustrating the estimated RMSE of Model 3
(see Simulation Design) in 500 replicates for sample size 300. The columns correspond to
RMSE boxplots of the NB-GAM, ZINB-GAM-Dose, ZINB-GAM-Int and ZINB-SPL
models respectively for the three different cell-types plotted over 500 replicates.
                                              69


Figure 3.13 Results of the simulation study, illustrating the performance of Model 4 (see
Simulation Design) in 500 replicates for 100 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
Figure 3.14 Results of the simulation study, illustrating the performance of Model 4 (see
Simulation Design) in 500 replicates for 300 sample size. The columns correspond to the
three different cell-types. Continuous red and blue lines and the shaded grey region
represent the log-mean curve averaged across 500 random replicates, the true simulated
curves, and the 95% pointwise confidence intervals. respectively.
                                             70


Figure 3.15 Results of the simulation study, illustrating the performance of Model 4 (see
Simulation Design) in 500 replicates for sample size 300. The columns correspond to the
three different cell-types. Continuous cyan, peach, darkblue, darkgreen and red lines
represent the represent the true simulated curves and the estimated log-mean curve
averaged across 500 random replicates for ZINB-SPL, ZINB-GAM-Dose, ZINB-GAM-Int
and NB-GAM models respectively
Figure 3.16 Results of the simulation study, illustrating the estimated RMSE of Model 4
(see Simulation Design) in 500 replicates for sample size 300. The columns correspond to
RMSE boxplots of the NB-GAM, ZINB-GAM-Dose, ZINB-GAM-Int and ZINB-SPL
models respectively for the three different cell-types plotted over 500 replicates.
                                              71


                             ni = 300                               ni = 100
   Scenario    Model     ψˆ1     SE     Bias   MSE Cov         ψˆ1     SE    Bias   MSE    Cov
    Case 1       Full   0.081 0.166 -0.019 0.025 0.956        0.101 0.294 0.099 0.009     0.964
              Intercept 0.000 0.1658 -0.099 0.094        1    0.000 0.291 -0.09 0.009       1
    Case 2       Full   0.109 0.2607 0.0089 0.010 0.96       0.0438 0.572 -0.056 0.324    0.892
              Intercept 0.000 0.148 -0.100 0.057 0.998        0.000 0.556 -0.100 0.0100   0.992
    Case 3       Full   0.113 0.152 0.013 0.006 0.956         0.135 0.262 0.034 0.038     0.988
              Intercept 0.000 0.150 -0.100 0.010         1    0.000 0.318 -0.100 0.010      1
    Case 4       Full   0.107 0.154 0.007 0.002 0.952         0.111 0.264 0.012 0.014     0.964
              Intercept 0.000 0.164 -0.100 0.010         1    0.000 0.288 -0.100 0.0100     1
Table 3.4 Parameter estimate, asymptotic standard error, bias and mean squared
error(MSE)of parameter ψ1 under full model and the intercept model. The results are
averaged over 500 replicates and reported for ni = 100, 300
3.5     Discussion
In this chapter we present the first statistically rigorous approach for modelling zero-inflated
single cell dose response data. Our method proposes a semi-parametric regression frame-
work for the estimation of several non-linear cell-type specific curves arising naturally in
single cell dose response experiments. We discuss an extremely flexible penalized regression
approach along with further extensions to estimation under monotonicity constraints. Our
novel technique flexibly models cell-type speficific heterogenity and its use of a ZINB-Bspline
model allows it to effectively capture diverse DR functions and accommodate undesirable
zero inflation in data. Further, to better deal with a large number of parameters, arising due
to the joint modelling of multiple cell-type specific DR curves, we propose a highly scalable
and computationally efficient MM algorithm. Through our MM estimator we are able to
significantly reduce the computational complexity of our proposed optimization problem.
Finally we propose two highly relevant hypothesis testing approaches that test of presence of
DR signal along with differences in the shape of the functional form for different cell-types.
    Comprehensive studies on simulated datasets confirm that our proposed model ZINB-
SPL yields better RMSE than the competing methods (ZINB-GAM-Dose, ZINB-GAM-Int
and NB-GAM). Further, ZINB-SPL displayed greater estimation accuracy for constant, lin-
ear and non-linear curve shapes. In case of the highly non-linear shapes, the performance of
                                                72


ZINB-SPL showed a clear improvement over the competing methods. The consistent per-
formnce of ZINB-SPL in four realistic simulation scenarios highlighted the robustness of our
proposed approach in modeling dose response relationships under varying characteristics of
scRNA-seq datasets. Additionally, the hypothesis testing results for the simulation scenarios
demonstrated effective false positive rate control while maintaining considerable power for
testing against local alternatives.
    An advantage of our proposed MM framework is that it is highly scalable and can provide
computationally efficient estimation for a large number of parameters in the model, which
could be the case if many cell-types are modelled jointly. In addition, it is straightforward to
generalize the proposed framework to monotonicity constrained estimation which may be of
interest in several biomedical problems. Thus we provide the first framework for statistically
modelling single cell dose response datasets along with a appropriate machinery for testing
differences between cell-type specific toxicological responses.
                                               73


                                          CHAPTER 4
                 KERNELIZED SIGNED GRAPH LEARNING FOR
                 SINGLE CELL GENE REGULATORY NETWORK
                                         INFERENCE
4.1     Single Cell Gene Regulatory Networks
Gene regulatory networks (GRNs) represent fundamental molecular regulatory interactions
among genes that establish and maintain all required biological functions characterizing a
certain physiological state of a cell in an organism [107]. Cell type identity in an organism is
determined by how active transcription factors interact with a set of cis-regulatory regions
in the genome and controls the activity of genes by either activation or repression of tran-
scription [108]. Usually, the relationship between these active transcription factors and their
target genes characterize GRNs. Due to the inherent causality captured by these mean-
ingful biological interactions in GRNs, genome-wide inference of these networks holds great
promise in enhancing the understanding of normal cell physiology, and also in characterizing
the molecular compositions of complex diseases [109, 110].
    GRNs can be mathematically characterized as graphs where nodes represent genes and
the edges quantify the regulatory relations. GRN reconstruction attempts to infer this reg-
ulatory network from high-throughput data using statistical and computational approaches.
Multiple methods encompassing varying mathematical concepts have been proposed during
the last decade to infer GRNs using gene expression data from bulk population sequencing
technologies, which accumulate expression profile from all cells in a tissue. These methods
can be broadly classified into two groups: the first group infers a static GRN, considering
steady state of gene expression, while the second group uses temporal measurements to cap-
ture the expression profile of the genes in a dynamic process. A thorough evaluation of the
static and dynamic models used in bulk GRN reconstruction can be found in [47, 48].
    Recent advances in RNA-sequencing technologies have enabled the measurement of gene
                                                74


expression in single cells. This has led to the development of several computational ap-
proaches aimed at quantifying the expression of individual cells for cell-type labelling and
estimation of cellular lineages. Several algorithms have been developed to arrange cells in
a projected temporal order (pseudotime trajectory) based on similarities in their transcrip-
tional states. In parallel, several dynamic models for single cell GRN reconstruction have
also been developed taking into account the estimated pseudotimes. Since single cell network
reconstruction algorithms try to establish functional relationships between genes taking into
account the entire population of cells, it is debatable as to whether additional knowledge
regarding cell state transitions may provide any added benefits [111, 3]. In summary, direct
application of bulk GRN reconstruction methods may not be adequate for single cell network
inference.
    The complex nature of single-cell transcriptomics data pose unique challenges in GRN
inference. Changes in gene expression due to cell-cell stochastic variation, cell-cycle hetero-
geneity and high sparsity due to insufficient sequencing depths and capture inefficiency for
genes with low expression form some of the unique characteristics of these datasets [112, 113].
Most importantly the high sparsity/high zero values feature in single cell datasets has gar-
nered a lot of attention and several statistical methods have been designed to particularly
model this phenomenon [10, 16, 114]. Recent research has indicated that these zero values
referred to as "dropouts" most likely result from biological variation and may be indicative
of heterogeneity in gene expression for varying cell types [21, 22].
    To account for these unique challenges a variety of algorithms for network reconstruction
in scRNAseq data have been recently proposed , but most of these methods fail to outperform
network estimation methods developed for bulk data or microarrays. [3, 111]. To that end, we
propose a network reconstruction algorithm that learns the co-expression between genes by
borrowing ideas from graph signal processing (GSP) literature. GSP provides a framework
for analyzing signals defined on graphs by extending classical signal processing tools and
concepts [115]. In many applications of GSP, the graph topology is not always available,
                                              75


thus it must be inferred or learned from the observed data. The major approaches to graph
learning (GL) include smoothness based methods [116], where the graph is learned with the
assumption that graph signals vary smoothly with respect to graph structure; and diffusion
process based models, where the graph is learned from signals that are assumed to be graph
filtered versions of random processes [117]. In this work, we focus on learning graphs with
the smoothness assumption for the following reasons. First, smooth signals admit low-pass
and sparse representations in the graph Fourier domain. Thus, the GL problem is equivalent
to finding efficient information processing transforms for graph signals. Second, many graph-
based machine learning tasks, such as spectral clustering, graph regularized learning etc.,
are developed based on the smoothness of the graph signals. Finally, smooth graph signals
are observed ubiquitously in real-world applications [117].
     Smoothness based GL is first considered in [118] by modeling graph signals using fac-
tor analysis, where the transformation from factors to observed signals exploits the graph
topology. By imposing a suitable prior on factors, the graph signals are modelled to have
low-frequency representation in the graph Fourier domain. This analysis results in an op-
timization problem where a graph is learned such that variation of signals over the learned
graph is minimized. Different variations of this framework with constraints on the learned
topology and for handling noisy graph signals were considered in [119, 120, 121, 122, 123].
All of the previous works learn unsigned graphs with the exception of [124], where a signed
graph is learned by employing signed graph Laplacian defined by [125]. By using signed
Laplacian, [124] aim to learn positive edges between nodes whose signal values are similar
and negative edges between nodes whose signal values have opposite signs with similar abso-
lute values. However, this approach is not suitable when graph signals are either all positive-
or negative-valued, as in the case of gene expression data.
     Considering the advantages of GL approaches in learning graph topologies that are con-
sistent with the observed signals, in this paper, we propose a novel GL algorithm for the re-
construction of GRNs. In particular, we assume gene expression data obtained from cells are
                                               76


                                      Activating                   Inhibitory
                    Euclidean Distances                                         Correlations
           1.00                                              1.0
           0.75                                              0.5
Distance                                       Correlation
           0.50                                              0.0
           0.25                                              0.5
           0.00                                              1.0
                  GSD   HSC mCAD VSC                               GSD          HSC mCAD VSC
                         Dataset                                                  Dataset
Figure 4.1 Euclidean distances (left, normalized to [0, 1]) and correlations (right) between
expressions of gene pairs in curated datasets studied in Section 4.4. Values are calculated
only for gene pairs that are connected in the ground truth GRNs and they are reported
separately for activating and inhibitory edges. Only inhibitory edges are reported for VSC,
since its GRN includes only inhibitory edges.
graph signals residing on an unknown graph structure, which corresponds to the GRN. One
important characteristic of GRNs is that they are signed graphs, where positive and negative
edges correspond to activating and inhibitory regulations between genes. To this end, we
propose a novel and computationally efficient signed GL approach, scSGL, that reconstructs
the GRN under the assumption that graph signals admit low-frequency representation over
activating edges, while admitting high-frequency representation over inhibitory edges. Bio-
logically, this modelling implies that two genes that are connected with an activating edge
have similar expressions, while two genes connected with an inhibitory edge have dissimilar
expressions. In Figure 4.1, we show how these assumptions hold for curated datasets studied
in Section 4.4. The figure shows that Euclidean distances between expressions are smaller for
gene pairs connected by activating edges than for those connected by inhibitory edges. The
figure also reports correlations between expressions, which indicates that expressions of gene
pairs connected with activating and inhibitory edges are positively correlated, i.e., similar,
                                             77


and negatively correlated, i.e., dissimilar, respectively. We also performed a Wilcoxon Rank
Sum test to determine whether the calculated associations for the positive ground truth
connections were significantly lower than the associations for the negative ground truth con-
nections for Euclidean distances. We test the null hypothesis, H0 : the distributions of both
populations are equal versus the alternative hypothesis Ha :, the distribution of the negative
associations are stochastically greater than the distribution of positive associations. In case
of the correlation distances we want to test Ha :the distribution of the positive associations
are stochastically greater than the distribution of negative associations. The calculated p-
values were all less than 0.01, hence justifying our assumptions for all curated datasets except
VSC, which only has negative associations. Another important characteristic of scRNAseq
is high proportion of dropouts. We address this issue by employing kernel functions to map
graph signals to a higher dimensional space and assuming low- and high-frequency represen-
tation for these high dimensional graph signals. This mapping allows us to use kernels that
are appropriate for modelling single cell data structures.
4.2     Background
4.2.1    Notations
Define an undirected graph as G = (V, E) where V is the set of nodes with a cardinality
of |V | = n and E ⊆ V × V denotes the edge set with cardinality |E| = m. Edges between
nodes i and j are denoted by ei,j having a weight wij . If the weights wi,j are strictly positive
then the graph is unsigned, whereas if wi,j allows both positive and negative values then the
graph is referred to as signed. Define W to be the n × n symmetric adjacency matrix of
graph G, where Wij = Wji = wij if eij ∈ E and 0, otherwise. In the case of signed graphs, W
can be decomposed as W + − W − where Wij+ = wij for wij > 0 and Wij− = |wij | for wij < 0.
Further the Laplacian matrix L is defined as L = D − W where D is the degree matrix of G.
Similar to the decomposition of the adjacency matrix, the Laplacian can be decomposed for
estimation of signed Laplacians. All one and all zero vectors and matrices are represented
                                                78


by 1 and 0, respectively. Finally, ith row and column of a matrix X are represented by Xi·
and X·i , respectively.
4.2.2    Low- and High-frequency Signals on Unsigned Graphs
A graph signal x ∈ Rn is a vector whose entries reside on the nodes of an unsigned graph G.
Graph Fourier transform (GFT) of x is defined as the expansion of x in terms of the eigenbasis
of the graph Laplacian [126]. This representation allows us to characterize x in terms of its
graph spectral content as either low- or high-frequency, where low(high)-frequency graph
signals have small (large) variation with respect to the graph [127].
     Let L = VΛV⊤ be the eigendecomposition of L where Λ is the diagonal matrix of
eigenvalues with Λii = λi and V·i is the eigenvector corresponding to λi . GFT of x is then
b = V⊤ x and inverse GFT is [126]:
x
                                                 Xn
                                      x = Vbx=       x
                                                     bi V·i .                              (4.1)
                                                 i=1
     Thus, x is the linear combination of eigenvectors of L with the coefficients equal to
the entries of x
               b. Eigenvectors of L corresponding to small eigenvalues have small variation
over the graph. Thus, if most of the energy of x     b lies in x
                                                               bi s corresponding to the small
eigenvalues, then x varies little over G. On the other hand, if most of the energy of xb lies in
bi s corresponding to the large eigenvalues, x has high variation over G. The total variation
x
of x over G is then quantified as [126]:
                       trace(b
                             x⊤ Λbx) = trace(x⊤ VLV⊤ x) = trace(x⊤ Lx),                    (4.2)
which is small for low-frequency graph signals and large for high-frequency ones.
4.2.3    Unsigned Graph Learning
An unknown unsigned graph G can be learned from a set of observed graph signals defined
over it with the assumption that graph signals have low-frequency representation in graph
                                              79


spectral domain, i.e., total variation is small. Using this assumption, [118] proposes to learn
G by minimizing (4.2) with respect to L given a set of graph signals {xi }pi=1 as follows:
                       min. trace(X⊤ LX) + α∥L∥2F s.t. trace(L) = 2n,                           (4.3)
                       L∈L
where L = {L : Lij = Lji ≤ 0 ∀i ̸= j, L1 = 0} is the set of Laplacian matrices and X ∈ Rn×p
is the data matrix whose columns are graph signals. The constraint trace(L) = 2n ensures
that trivial solutions are avoided. ∥L∥F is employed to control the sparsity of the learned
graph as discussed in [119]. In particular, the Frobenius norm penalizes L from having
entries with large absolute values, and the non-positivity constraint (Lij = Lji ≤ 0) and
degree constraint (trace(L) = 2n) threshold edges with small weights. Thus, as α → ∞
edges of the learned graph start having very similar weights and no thresholding can be
performed to sparsify the learned graph. On the other hand, small values of α leads L
to have entries with varying values and the imposed constraints threshold the small values
yielding a sparse graph.
4.2.4    Kernels
Traditional machine learning and signal processing applications are mostly developed based
on linear modelling due to their simplicity. However, real world problems require nonlinear
estimation that can detect more complex patterns in the data. For this purpose, kernels are
introduced to capture the nonlinearity by mapping signals to a high-dimensional space [128].
Kernels correspond to dot products in a higher dimensional feature space and overcome
explicit construction of the feature space; thus providing simplicity of linear methods in
nonlinear estimation. Given data from input space X , and a mapping function ϕ : X → H
where H is an Hilbert space, a kernel function can be expressed as an inner product in the
corresponding feature space, i.e., κ(xi , x′i ) = ⟨ϕ(xi ), ϕ(x′i )⟩, where κ : X ×X → R is a finitely
positive semi-definite kernel function [129]. An explicit representation of the feature map
ϕ is not necessary and the dimension of mapped feature vectors could be high and even
                                                  80


infinite. By using different kernels, learning algorithm can be augmented to exploit various
(nonlinear) associations between input data. For example, the first term in (4.3) can be
rewritten as trace(XX⊤ L) = i,j ⟨Xi· , Xj· ⟩Lij , where the dot product between the rows of
                                 P
X can be replaced by κ(Xi· , Xj· ). In the next section, this observation is used to develop a
graph learning framework that is able to capture nonlinear relations between graph signals.
4.3      Methods
4.3.1     Signed Graph Learning
In (4.3), an unsigned graph is learned with the assumption that the observed graph signals
have low-frequency representation in graph spectral domain. In order to learn a signed graph
G, one needs to make some additional assumptions about the graph signals X. In this work,
we make the following assumptions:
   1. Signal values on nodes connected by positive edge values are similar to each other, i.e.,
       variation over positive edges is small.
   2. Signal values on nodes connected by negative edge values are dissimilar to each other,
       i.e., variation over negative edges is large.
From GSP perspective, these assumptions correspond to graph signals being low- and high-
frequency over positive and negative edges, respectively. Let G+ be the graph corresponding
to the positive edges of G and let G− be the graph corresponding to the negative edges of
G with the edge weights equal to the absolute value of the original edge values. Assumption
1 implies that the graph signals have low-frequency representation in the graph Fourier
domain of G+ . On the other hand, assumption 2 implies that the graph signals have high-
frequency representation in graph Fourier domain of G− . We use (4.2) to quantify how
well the graph signals fit these assumptions. Thus, to learn an unknown signed graph, we
minimize trace(X⊤ L+ X) with respect to L+ while maximizing trace(X⊤ L− X) with respect
                                                 81


to L− :
                min. trace(X⊤ L+ X) − trace(X⊤ L− X) + α1 ∥L+ ∥2F + α2 ∥L− ∥2F
              L+ ,L− ∈L
               s.t.    trace(L+ ) = 2n, trace(L− ) = 2n                                    (4.4)
                         ij = 0 if Lij ̸= 0 and Lij = 0 if Lij ̸= 0 ∀i ̸= j,
                                    −            −
                       L+                                   +
where Frobenius norms and the first two constraints are similar to (4.3) and the last con-
straint ensures that L+ and L− are not non-zero for the same indices.
4.3.2    Kernelized Signed Graph Learning
As mentioned in Section 4.2.4, kernels are used in learning algorithms to exploit various
(nonlinear) relations between input data. This is especially crucial in GRN inference as
shown in [130], where 17 different association measures between gene expressions are com-
pared in terms of their performance in GRN inference and various other tasks on single-cell
transcriptomic datasets. In this paper, we consider three kernels: correlation coefficient, r,
measure of proportionality, ρ [131] and a modification of Kendall’s tau (τzi ) for zero inflated
non-negative continuous data [132]. These kernels are selected because r is a commonly used
measure for network inference, ρ performs the best in [130] and τzi can handle high ratio of
dropouts in scRNAseq.
    As mentioned in Section 4.2.4, kernels provide an efficient way of capturing the non-
linear associations between input data samples. This is especially crucial in building single
cell GRN learning algorithms due to the unique statistical properties exhibited by single
cell data. To determine optimal measures of association in single cell network learning,
[130] evaluated 17 association measures in terms of their ability to reconstruct gene and
cellular networks from single cell transcriptomic datasets. Two measures, ρ [131], a measure
of association for compositional data and τzi , a measure of association for zero inflated
non-negative continuous data [132] are shown to perform consistently better in all learning
scenarios investigated in [130]. The strong performance of ρ can be explained on the basis
                                                 82


that scRNA-seq captures only a small proportion of messenger RNA in each cell and therefore
gene expression measurements can be viewed as relative measures of abundance (as seen in
compositional data). On the other hand, τzi , a modification of Kendall’s rank correlation
coefficient, is expected to provide less biased estimates of association in the setting of zero-
inflated continuous data, a characteristic of single cell transcriptomic datasets [132]. To
compare and contrast these two measures, the correlation kernel r is additionally investigated
since it’s widely used in GRN reconstruction algorithms.
    In its current form, (4.3) cannot be used directly for different associations. Thus, the
optimization problem in (4.4) is extended using kernels. The first term in (4.4) can be
written as trace(XX⊤ L+ ) = i,j ⟨Xi· , Xj· ⟩L+    ij and the second term can be written similarly.
                                  P
By replacing dot products with a given kernel function, i.e., κ(Xi· , Xj· ), the problem in (4.4)
can be extended to incorporate the different associations as:
                    min. trace(KL+ ) − trace(KL− ) + α1 ∥L+ ∥2F + α2 ∥L− ∥2F
                   L+ ,L− ∈L
                   s.t.     trace(L+ ) = 2n, trace(L− ) = 2n                                (4.5)
                              ij = 0 if Lij ̸= 0 and Lij = 0 if Lij ̸= 0 ∀i ̸= j,
                                         −             −
                            L+                                   +
where K ∈ Rn×n is the kernel matrix with Kij = κ(Xi· , Xj· ). From GSP perspective, this
modification implies that graph signals on each node, i.e., Xi· , are first mapped to a (higher
dimensional) Hilbert space and the signed graph is learned in this new space. Namely,
let Φ ∈ Rn×bp be the matrix constructed from mapping Xi· ’s to the Hilbert space H with
dimension pb where rows of Φ are ϕ(Xi· ). When learning unknown signed graph G with a
kernel, each column of Φ is a graph signal over G and they are assumed to have low- and
high-frequency representation with respect to G+ and G− , respectively. Extending signed
graph learning problem in (4.4) using kernels brings flexibility and any association metric
in [130] can be implemented in this framework if it is a positive semi-definite kernel. The
optimization procedure for 4.5 is given in the Supplementary Material.
                                                   83


4.3.3     Hyperparameter Selection
The optimization problem in 4.5 requires the selection of two regularization parameters α1
and α2 , which determine the density of the learnt graph, i.e., large values of α1 (α2 ) result
in denser L+ (L− ). Their values can be set to obtain a graph with desired positive and
negative edge densities. Next we illustrate the algorithm used for generating realistic single
cell simulations.
   1. Given a matrix X ∈ Rn×p whose columns are graph signals, we randomly shuffle each
       column of the matrix k times creating k surrogate data matrices.
   2. Association between rows of the surrogate data matrices are calculated by the kernel
       employed in (4.5).
   3. Thresholds λ1 and λ2 are selected as the pth and (100 − p)th percentiles of the values
       in the association matrix calculated in Step 2.
   4. Steps (1-3) are repeated k times to construct the empirical distribution of the thresholds
       λ1 and λ2 .
   5. Finally, λb1 and λb2 are selected to be the medians of the empirical distributions con-
       structed in Step 4.
   6. The association matrix for the original data X is constructed.
   7. The number of entries in the association matrix that are smaller than λb1 are determined
       and normalized by the total number of entries in the association matrix to obtain the
       density of L− . Similarly, number of entries in the association matrix greater than λ  b2
       is used to determine the density of L+ .
   8. Values of α1 and α2 are then selected to learn graphs with the estimated graph densities
       found in Step 7. Since the density of the positive (negative) graph increases monoton-
                                               84


      ically with the value of α1 (α2 ), bisection search is used to determine the values of α1
      and α2 that give the desired densities.
4.3.4    Generation of simulated datasets from zero-inflated negative
         binomial distribution
In this section, we outline the algorithm that was used to generate the simulated datasets
for parameter sensitivity analysis.
   1. For each simulation setting, we first generated a binary association graph
      Gj1 j2 , ∀j1 , j2 ∈ {1, . . . , p} using either of the three graph topologies random, hub and
      cluster.
   2. A binary indicator Ij1 j2 was next sampled for each entry of the association graph with
      Ij1 j2 ∼ Bernoulli(0.5).
   3. Given the binary association graph a weight matrix was generated as:
                                           
                                                                      if Ij1 j2 = 0,
                                           
                                           Gj1 j2 U nif (0.3, 0.7)
                                           
                               Wj1 j2 =
                                           Gj1 j2 U nif (−0.7, −0.3) if Ij1 j2 = 1.
                                           
                                           
   4. Random samples of n multivariate Gaussian random variables were then generated,
      with known weight matrix Wj1 j2 . The random sample was denoted as (X1 , . . . , Xp ),
      where each variable (gene vector) Xj = (Xj1 , . . . , Xjn )T consisted of n realizations.
   5. To mimic the dropout phenomenon present in real scRNAseq datasets, we next in-
      troduced additional zeros to the gene expression matrix. Following [13], the dropout
      probability for each row (gene vector) in the gene expression matrix X was calcu-
      lated as: pji = exp(−ρXji2 ), where ρ represents the exponential decay parameter that
      controls the dependence between the dropout probability and gene expression.
                                                       85


    6. A binary indicator was next sampled for each entry: ηji ∼ Bernoulli(pji ), with ηji = 1
       indicating that the corresponding entry of Xji would be replaced by 0. The dropout
       probability for each gene vector was calculated as ωj = ni=1 ηji .
                                                                 P
    7. Using a modification of the NORTA (Normal to Anything) method [133] we generated
       samples from a multivariate zero inflated negative binomial distribution based on the
       multivariate normal samples generated in Step 4 using mean, dispersion and zero-
       inflation parameters λ, k and ωj s.
    8. To mirror real scRNA-seq gene expression data behaviour, the gene expression mean
       λ and standard deviation k were estimated from a real scRNA-seq dataset, Peripheral
       Blood Mononuclear Cells (PBMC) freely available from 10X Genomics.
4.3.5     Performance Metrics
4.3.5.1     AUPRC and AUROC:
Area under precision and recall curve and area under the receiver operating characteristic
are calculated by comparing inferred graphs to ground truth gene regulations. During this
calculation, signs of the learned edges are ignored as the AUPRC and AUROC are perfor-
mance metrics restricted to binary classification. In particular, we first take the absolute
value of edge weights and then compare them to ground truth edges. Thus, these metrics
indicate how well methods detect edges without considering the signs of the inferred edges.
Ground truth networks are considered as undirected and self-loops are ignored. Following
[3], we also defined AUPRC ratio and AUROC ratio as the ratio of AUPRC (AUROC) value
of the methods to AUPRC (AUROC) of the random estimator.
4.3.5.2     AUPRC Activating/Inhibitory:
One of our goals is to learn whether the edges are activating or inhibitory. AUPRC as defined
above cannot evaluate the sign information. Thus, for curated datasets, whose ground truth
                                              86


gene regulations include signed edge information, we calculate AUPRC for activating and
inhibitory edges seperately. In particular, for methods that learn signed graphs we compare
the learned positive edges to activating edges in ground truth and learned negative edges
to inhibitory edges in the ground truth. For methods that do not learn signed edges, we
evaluate the inferred edges with respect to the ground truth activating and inhibitory edges
separately to calculate two AUPRC values.
4.3.5.3    EPR
Early precision ratio is the fraction of true positives in the top-k edges in the inferred graphs
where k is the number of edges in the ground truth network [3]. For methods that return
signed edges, we found top-k edges after taking absolute value of the edge weights, thus this
metric is used for edge detection performance rather than the detection of edge signs. Ground
truth networks are considered to be directed graphs and self-loops are ignored. Finally, EPR
ratio is defined as the ratio of the EPR values of the methods to the EPR value of the
random estimator.
4.4     Results
In this section, performance of scSGL is evaluated and compared to state-of-the-art GRN
inference methods on various simulated and experimental scRNAseq datasets. We selected
GENIE3 [52], GRNBOOST2 [53], PIDC [134] and PPCOR [50] for comparison as they are
the top performing methods in [3]. GENIE3, GRNBOOST2 and PPCOR were originally
developed for bulk analysis, while PIDC is developed for single cell gene expression data.
Among these methods, GENIE3 and GRNBOOST2 return fully connected directed networks,
while the remaining two infer undirected networks. Finally, only PPCOR algorithm returns
signed graphs. Given the inherent sparsity of gene networks, we used the area under the
precision-recall curves (AUPRC) ratio as the primary evaluation metric. Supplementary
Material also includes results using area under the receiver operating characteristic curves
                                                87


               GSD Activating  GSD Inhibitory  HSC Activating  HSC Inhibitory mCAD Activating mCAD Inhibitory   VSC Inhibitory
                                                                                                                               High
     GENIE3 1.71 1.78 1.92    1.18 1.16 0.98  3.19 3.24 3.18  2.71 2.23 1.90  1.69 1.88 2.33  1.05 1.03 0.98  2.78 2.14 1.46
 GRNBOOST2 1.60 1.65 1.50     1.43 1.42 1.29  2.93 3.10 3.02  2.93 2.54 2.24  1.72 1.64 2.00  1.00 0.99 0.99  2.73 2.24 1.77
        PIDC 1.85 1.83 1.73   1.27 1.25 1.22  3.12 3.12 3.18  2.84 2.53 2.11  1.72 1.70 1.90  1.09 1.05 1.06  2.69 2.81 2.59
      PPCOR 2.49 2.22 1.76    1.45 1.33 1.50  3.66 3.61 3.37  3.23 2.93 2.43  2.20 2.17 2.24  1.34 1.37 1.56  2.62 2.49 2.52
    scSGL-r 2.64 2.43 2.26    2.08 2.06 2.02  3.57 3.53 3.51  3.12 3.28 3.06  2.50 2.50 2.48  1.53 1.50 1.50  2.72 2.84 2.87
    scSGL-    2.42 2.61 2.36  1.81 2.03 2.16  3.77 3.76 3.75  2.54 2.87 2.23  3.06 2.88 2.48  1.67 1.63 1.72  2.70 2.57 2.61
   scSGL-  zi 2.74 2.57 2.27  2.23 2.30 2.32  3.83 3.76 3.80  3.02 3.31 2.75  2.75 2.69 2.48  1.46 1.48 1.55  2.69 2.67 2.78
                                                                                                                               Low
               %0 %50 %70      %0 %50 %70      %0 %50 %70      %0 %50 %70      %0 %50 %70      %0 %50 %70      %0    %50 %70
Figure 4.2 Performance of scSGL and state-of-the-art methods on curated datasets as
measured by AUPRC for activating and inhibitory edges. x-axis indicates dropout ratio in
the dataset.
(AUROC) ratio and early precision ratio (EPR) ratio, whose overall results are consistent
with the following observations made with AUPRC ratio.
4.4.1            Synthetic Datasets
Curated Datasets From BEELINE: The first simulation datasets we consider are cu-
rated from "published Boolean models of GRNs" [3]. These datasets were generated using
the recently proposed single cell GRN simulator BoolODE [3]. BoolODE converts boolean
functions specifying a GRN directly to ODE equations using
GeneNetWeaver [135, 136], a widely used method to simulate bulk transcriptomic data
from GRNs. These datasets are generated from four literature-curated Boolean models:
mammalian cortical area development (mCAD), ventral spinal cord (VSC) development,
hematopoietic stem cell (HSC) differentiation and gonadal sex determination (GSD). These
models represent different types of graph structures, with varying numbers of positive and
negative edges; thus serving as good examples for illustrating the robustness of the proposed
method in modelling signed graph topologies. BoolODE is used to create ten random sim-
ulations of the synthetic gene expression datasets with 2,000 cells for each model. For each
dataset, one version with a dropout rate of 50% and another with a rate of 70% are also
considered to evaluate the performance of the methods under missing values.
       AUPRC ratios are calculated separately for activating and inhibitory edges and their
                                                                         88


Parameter Sensitivity Analysis: To mimic the zero inflated and overly dispersed nature
of most scRNAseq datasets, we simulated gene expression data from a multivariate zero-
inflated negative binomial (ZINB) distribution for our second simulation. These datasets
were then used to conduct parameter sensitivity analysis for the proposed methods. Given
a known graph structure, synthetic datasets are generated from a ZINB distribution by
adapting an algorithm developed by [133]. The three parameters of the ZINB distribution;
λ, k and ω, which control its mean, dispersion and degree of zero-inflation, respectively were
determined from a real scRNAseq dataset to make the simulations mirror the properties of
real datasets.
    The ZINB simulator is then used to generate expression data from three different network
structures: random networks, networks with a given community structure and networks with
hubs. Random networks are generated using Erdős–Rényi model with desired edge density.
Since Erdős–Rényi model is not realistic due to its binomial degree distribution, we also
consider networks with hubs. These networks are generated using a Barabási–Albert model
whose degree distribution follows a power-law function. Finally, networks with community
structure, also known as modular networks, are generated using a disjoint union of random
graphs. The accuracy of the scSGL inferred graphs were then evaluated for all three graph
structures. To investigate the robustness of scSGL, we simulated datasets from the afore-
mentioned network topologies by varying the following parameters: (i) number of genes (10,
50, 100 and 250), (ii) number of cells (100, 300, 500 and 1000) and (iii) dropout probabilities
(0.26-0.36). To account for the inherent randomness of the simulations, 10 independent data
replicates were generated for every parameter combination and the mean AUPRC ratios
obtained by averaging over the replicates are reported in Figure 4.3.
    Recent investigations of scRNAseq datasets have revealed that dropout rates are primarily
driven by a combination of technical and biological factors [108]. Consequently, while mean
gene expression and proportion of zeros are linked, this may vary based on cell type, sex,
and other biological and technical factors. While investigating the impact of dropout rates
                                               90


on network estimation accuracy, we found a steady decline in AUPRC ratios for all methods
with an increase in the number of zeroes. scSGL irrespective of the kernel choice maintained
the highest AUPRC ratios across all network topologies. Gene expression in scRNAseq
datasets can be intepreted as relative measures of abundance owing to the datasets being a
combination of gene expression derived from several cell-types. This could be the reason why
proportionality measures perform well [130]. The strong performance of τzi can be explained
on the basis that it explicitly models the dropouts present in scRNAseq datasets. Despite the
poor performance of regularized correlation networks (PPCOR), we see a strong performance
of scSGL when using the correlation kernel. This proves that gene-gene relationships are in
fact non-linear in nature. This belief is also strengthened by the above average performance
of tree-based machine learning algorithms like GENIE3 and GRNBOOST2. It is to be noted
that PIDC, the only other method capable of modelling excess zeroes, while accounting for
non-linear relationships fails to achieve a top-ranking AUPRC ratios.
    Next, we evaluated the impact of cell sizes on network reconstruction. Figure 2 demon-
strates a clear rise in AUPRC ratios when the number of cells are increased. PIDC, the only
other single cell network estimation technique, achieves a below average performance at the
lowest sample size of 100. This could be due to the fact that PIDC requires large sample
sizes for accurate estimation of pairwise joint probability distributions for calculating mutual
information. In general, PPCOR has the worst performance among all methods. It should
also be noted that the performance of GRNBOOST2 was equivalent to scSGL for all the
network topologies when the sample size was 10 times the number of genes. These results
indicate the importance of sample size in accurate network estimation for all of the methods
and network topologies is considered.
    Finally, the performance of each of the methods was evalulated by varying the number
of genes. All methods had high AUPRC ratios across network topologies when the number
of genes was small. While the AUPRC ratios of all the methods declined with an increase in
the number of genes, scSGL performed significantly better than most of the benchmarking
                                                91


methods. This dip in performance could be attributed to the fact that all methods learn very
dense networks. With an increase in the number of nodes, there is an increasing number
of false edges detected by every algorithm. The performance of scSGL could further be
improved with a more biologically informed framework for hyperparameter selection.
Computational Complexity: Methods are compared in terms of their scalability to
datasets with large number of genes. For this purpose, synthetic data generation process
used in parameter sensitivity analysis is employed to create three datasets with 500, 1000
and 2000 genes. Each dataset is generated from Barabási–Albert model, includes 1000 cells,
and has a dropout ratio of 0.26. Average run time and AUPRC ratios over 10 replicates
are reported in Figure 4.4. We reported results only for the correlation kernel, as other ker-
nels have similar performances and run times. It is observed that scSGL runs significantly
faster than GENIE3, GRNBOOST2 and PIDC while having superior performance in terms
of AUPRC ratio. Although PPCOR runs faster than scSGL, it shows poor performance.
Further discussion on the computational and storage complexity of scSGL is provided in [2]
                                                     

4.4.2    Real Datasets
For real datasets, we consider scRNAseq expressions of human embryonic stem cells (hESC)
and mouse embryonic stem cells (hESC) which include 758 and 451 cells, respectively. We
inferred GRNs between 500 highly varying genes along with highly varying TFs [3]. Inferred
GRNs are compared to three different databases of gene regulations: STRING [137], cell-
type specific [138] and nonspecific [139, 140, 141]. AUPRC ratios are reported in Figure
4.5. All methods have performance values close to random estimator. Except PPCOR,
which has random performance in both datasets and for all databases, methods have com-
parable performances, with scSGL showing slightly better performance in hESC and while
benchmarking methods working slightly better in mESC.
    To add biological meaning to the estimated networks we compared them to the reference
networks in the STRING database. The STRING database is a compendium of protein-
protein interactions created by gathering information from varying sources like experimental
studies, text mining etc. The edges in the STRING network are classified as high confi-
dence (minimum score of 0.700), medium confidence (minimum score of 0.400) and and low
confidence (minimum score of 0.150). In hESC dataset, scSGL-ρ identified the maximum
number of high confidence associations present in the STRING reference network. scSGL-ρ,
scSGL-r and scSGL-τzi each identified 60, 56 and 24 high confidence STRING interactions,
respectively, with an edge confidence greater than 0.5. The interactions identified by scSGL-
r form a network of 56 unique genes including genes Nanog, Sox2, Sox4, Pou5f1, Ctnnb1,
Gata2, Gata3 and many others. Lineage-specific marker genes, Cdk6, Col5a1, Vim, and
Itg5, which are known to have regulatory roles in cell differentiation were also detected by
scSGL-r but with edge confidence less than 0.5 (0.1-0.3) [142, 143]. scSGL-ρ and scSGL-
r identified 20 common genes including Sox2, Sox4, Gata6, Ctnnb1 and Bmp4. scSGL-
τzi identified the least number of genes but successfully retrieved lineage markers Nanog,
Sox2, Sox4, Pou5f1,Ctnnb1, Gata2, Gata3. All three kernel methods identified genes Sox4,
Ctnnb1, Bmp4 and Gata6. According to the STRING database, the 56 genes identified by
                                              93


                                  hESC AUPRC Ratio                  mESC AUPRC Ratio
               GENIE3         0.976     0.959     1.752        0.999     1.164    1.590   High
            GRNBOOST2         1.017     0.992     1.573
                                                               1.017     1.161    1.653
                 PIDC         1.029     1.056     1.987
                                                               1.029     1.205    1.800
                PPCOR         1.000     1.000     1.000
                                                               1.000     1.000    1.000
               scSGL-r 1.117            1.164     1.723
                                                               1.036     1.133    1.463
              scSGL-          1.030     1.128     1.894
                                                               1.006     1.090    1.285
             scSGL-    zi     1.061     1.150     1.721
                            Sp         np         RIN
                                                     G         1.018     1.123    1.423   Low
                              ecific     ecific   ST
                                       No
                                                                ific       ific   RIN
                                                               ec        ec       ST G
                                                               Sp      Nonp
Figure 4.5 Performance of methods for two real-world scRNAseq datasets. Inferred graphs
are compared to three different gene regulatory databases.
scSGL-r are associated with 839 significantly enriched biological process gene ontology (GO)
terms that include cell differentiation, chromosome separation, specification of animal organ
position, mitotic nuclear division and organ formation. Genes identified by scSGL-ρ and
scSGL-τzi had similar functional enrichments for biological processes. To demonstrate some
of the learned associations in hESC, we plotted the subnetwork of 24 lineage specific marker
genes using scSGL [143]. Figure 6 shows the presence of activating relationships between
key definitive endoderm (DE) markers like Gata6, Gata4, and Eomes and joint inhibition
of pluripotency markers Pou5f1, Nanog, and Sox2. Gata4 and Gata6 have been reported as
necessary for the development and function of a number of endoderm-derived tissues and
cells [144, 145] and onset of Gata4 and Gata6 expression has been reported to be coincident
with the beginning of endoderm gene expression [146]. In addition, inhibition of pluripo-
tency markers by the key DE markers indicates progression of the cells towards a DE state.
We also learned day specific scSGL graphs for the 24 marker genes by first clustering the
dataset over days (0,12,24,36,72 and 96 hrs). Day specific graphs showed that scSGL can
effectively recover gene network changes from data clustered over time points (Section 7;
Supplementary Data) .
   In mESC dataset, scSGL-ρ, scSGL-r and scSGL-τzi each identified 67, 103 and 55 high
confidence STRING interactions, respectively, with an edge confidence greater than 0.5. The
                                                          94


three estimated networks capture interactions regulated by known transcription factors Sox2,
Nanog, Klf4, Myc and Sall4 [147]. scSGL-r identified known relationships between Sox2 and
Nanog; Esrrb with Sox2 and Rybp among many others. scSGL-ρ identified known relation-
ships between Esrrb and Etv5 and indirect interactions between Sall4 and Rybp regulated
by TF Oct4. scSGL-τzi identified most of the important relationships identified by scSGL-
r along with additional relationships between Sox2, Nanog, and Rif1. According to the
STRING database, the 103 genes identified by scSGL-r are associated with 908 significantly
enriched biological process GO terms that include cell fate determination, specification and
commitment, mitotic DNA replication and regulation of nodal signalling pathway. Similar
to hESC analysis, scSGL, irrespective of the chosen kernel, identified genes with similar
functional enrichments for biological processes. To demonstrate some of the learned associ-
ations in mESC, we plotted the subnetwork of 19 well known marker genes+TF in mESC
differentiation, estimated using scSGL. As can be seen in Figure 6B, Nanog, Gata4, Sox2,
Sox17, Zfp42 and Lefty1 emerge as some of the hub nodes with high degrees of associations.
The learned network also captures vital signed associations between Sox2, Nanog, Sox17,
                      hESC Lineage Marker Genes                                     mESC Genes
                   HAPLN1
                                                                                         LEFTY1
                         F1            ZFP42                                                        SOX17
                    POU5
                          A4          LHX                                       ESR               SA
                       GAT 1              1                                         RB               LL4
                             R                       16                   ET
                 SO        CE                  IFI    3B                       V5                           1
                   X2                            DN
                                                    MT               MY
                                                                          CN                            RIF
                                                                                                              2
               EOME
                    S                             PMAIP
                                                         1
                                                                  CDC5L                                   ZFP4
               ERBB4                               NANOG                                                     SOX2
                                                  GATA2           COL4A2
                LECT1                                                                                        SFPQ
                                                 GS                      2
                   CT1                              C                DAB                                GL
                 MY A6                 3                                       G                            UL
                       T            TA                                    NO                         BP
                     GA    ND     GA K10                                        GAT               RY
                                                                                         POU5F1
                              1          P                            NA
                        HA
                        GNG1
                                    MA                                             A4              KLF4
                            1         SOX17
                                      PRDM14
                                  A                                                           B
Figure 4.6 The subnetworks of 24 lineage specific genes in hESC (A) and 19 well known
marker genes in mESC (B). We report results of scSGL-r as it has the highest AUPRC
ratio in Figure 4.5. For clarity, only those edges whose absolute edge weight fall into the
top 1 percentile are shown. Node sizes are proportional to their degrees.
                                                             95


Zfp42 and Gata4. It is well known that Sox2 and Nanog form the core of a transcription
factor network that promotes embryonic stem cell pluripotency and self- renewal. Zfp42 is
also known to be a direct target of Nanog, which is augmented by Sox2 [148]. In addition,
Sox17 together with Gata4 expression reinforce a transcriptional network that antagonizes
Nanog expression to initiate differentiation [149].
    Finally, to analyze the relation between edges identified by scSGL and benchmarking
methods, the intersection between the top 1000 edges is reported as an UpSet plot [150] in
Figure 4.7. In both datasets, PPCOR does not have any intersection with other methods
probably because of its poor performance reported in Figure 4.5. The remaining 6 methods
have an intersection set with cardinality around 40 edges. The same number of common
edges is found in the intersection of PIDC, GENIE3, GRNBOOST2, scSGL-τzi , scSGL-r
and in the intersection of PIDC, GENIE3, scSGL-τzi , scSGL-r, scSGL-ρ. These observa-
tions hold for both datasets, indicating the reproducibility of the proposed approach across
different datasets. Edges identified by τzi and r have more intersecting edges with bench-
marking methods and with each other than those identified by ρ, which indicates that the
benchmarking methods have more common edges with correlation based association metrics
than with proportionality measures. scSGL methods have more common edges with PIDC
than with GENIE3 and GRNBOOST2, which may be due to the fact that PIDC learns
co-expression GRN similar to scSGL, while GENIE3 and GRNBOOST2 learn directed in-
teractions between genes.
4.5     Discussion
In this paper, we have introduced a novel network inference algorithm based on GSP. Our
proposed algorithm scSGL identifies functional relationships between genes by learning the
signed adjacency matrix from the gene expression data under the assumption that graph
signals are similar over positive edges and dissimilar over negative edges. This novel tech-
nique also takes into account the nonlinearity of the gene interactions by employing kernel
                                              96


                                                Intersection between top 1000 edges of methods for hESC dataset
                                          800
                      Intersection size
                                          600
                                          400
                                          200
                                           0
                   PPCOR
                     PIDC
               GRNBOOST2
                   GENIE3
                 scSGL- zi
                  scSGL-
                   scSGL-r
    1000   0
                                                Intersection between top 1000 edges of methods for mESC dataset
                                     1000
                                          800
                 Intersection size
                                          600
                                          400
                                          200
                                           0
                   PPCOR
                     PIDC
               GRNBOOST2
                   GENIE3
                 scSGL- zi
                  scSGL-
                   scSGL-r
    1000   0
Figure 4.7 UpSet plot that shows intersection between the top 1000 edges by scSGL with 3
kernels and benchmarking methods in hESC and mESC datasets.
mappings. We applied scSGL to four curated datasets derived from "published Boolean
models of GRN" and two real experimental scRNAseq datasets during differentiation. To
conduct an in-depth analysis of gene co-expression network reconstruction from scRNAseq
datasets, we generated simulations from zero inflated negative binomial distributions. These
simulations, generated using different parameter combinations, were used to investigate the
robustness of our proposed methods to changing cell sizes, gene numbers and dropout rates.
   For the curated datasets, scSGL consistently obtained higher AUPRC ratios in compari-
son to the benchmarking methods, despite each dataset having a different number of stable
cell states. Parameter sensitivity analysis reflected the superior performance of scSGL in
estimating networks under varying network topologies. The performance remained consis-
tent even when the gene numbers increased, the dropout rates were high and the sample
sizes were low. This indicated the robustness of scSGL in modelling networks under varying
characteristics of scRNA-seq datasets.
   The networks estimated from real data using scSGL identified important functional re-
lationships between target genes and transcription factors and exhibited enrichment for ap-
                                                                      97


propriate functional processes. In addition, day specific analysis of the hESC dataset showed
how scSGL can be used to infer gene networks for clustered datasets (Supplementary Mate-
rial). This may be particularly important in single cell network inference, if cell-type specific
networks are of interest. We also demonstrated that scSGL attained performance compa-
rable to state-of-the-art-methods in real data experiments, with the performance of all the
GRN reconstruction methods methods being close to random. Accuracy evaluation of the
predicted networks for the real datasets were done using cell-type specific, non-specific and
functional networks described in [3]. However, most of the information in these ground truth
datasets have been accumulated based on tissue level data and hence it’s not completely ap-
propriate to calculate precision and recall rates from these databases.
    Although scRNAseq techniques provide significant advantages over bulk data such as
increased sample size with higher depth coverage and and presence of highly distinct cell
clusters, it also comes laced with multiple sources of technical and
biological noise. Moreover, the inability to differentiate between technical and biological
noise, and the absence of adequate noise modelling techniques further exacerbate the problem
[151, 8]. scSGL aims to capture the node similarities and dissimilarities based on distances
between graph signals. These graph signals exhibit smoothness, which implies that within a
given node cluster, genes tend to be homogeneous, while varying across clusters. This leads
to densely connected graphs where the heterogeneity induced by distinct cell sub-populations
can be simultaneously curbed. Using single cell data with cell cluster labels, easily obtained
from single cell clustering algorithms [152], in conjunction with scSGL can aid in identifying
functional modules that are associated with a cell type [153]. Integrating pseudotemporal
ordering with scSGL can further help in identifying the functional modules associated with
differential pathways [154].
    Despite the availability of a large number of computational methods, accurate GRN
reconstruction still remains an open problem. Most reconstruction methods are based on
the assumption that presence of an edge implies regulatory relationships. They also have
                                               98


the tendency to establish links between genes regulated by the same regulator. These issues
can generate a lot of false positives and therefore additional sources of data such as ChIP-
seq measurements that help in identifying direct interactions between TFs and target genes,
can provide a way to filter out the spurious interactions [155]. Finally, gene regulation has
multiple layers beyond direct TF-target interaction, but functional relationships can only be
established if these relationships induce persistent changes in transcriptional state. As single
cell data sources over multiple modalities continue to become available, it will be interesting
to see how integration of these data types aids GRN reconstruction using scSGL. [156].
4.6     Acknowledgements
This chapter is based on my published paper [2]. I would like to thank the joint first author
Abdullah Karaslaanli, and co-authors Dr Selin Aviyente and Dr Tapabrata Maiti for their
support and advice. This work was supported by the National Science Foundation grant:
CCF 2006800.
                                               99


APPENDICES
    100


                                   APPENDIX A
BAYESIAN SINGLE CELL RNASEQ DIFFERENTIAL GENE EXPRESSION
                      TEST FOR DOSE RESPONSE STUDY
                                      DESIGNS
Figure A.1 Principal components analysis of cell types identified in a real hepatic
dose-response snRNAseq dataset. Points represent a distinct cell and colors reflect the dose
group.
                                          101


Figure A.2 Comparison of fold-change distribution in simulated and real dose-response
snRNAseq data where the log-normal mean (facLoc) and standard deviation (facScale)
were varied as well as the percentage of differentially expressed genes and proportion of
downregulated DE genes. A total of 5000 genes were simulated or sampled from real data
and the fold-change for the highest dose group was calculated. The Kullback-Leibler
Divergence (KLD) intrinsic discrepancy (ID) was used to evaluate the similarity in
distributions.
                                             102


Figure A.3 Benchmarking scores for data simulated using the Splatter [4] wrapper Splattdr
with default initial parameters. A total of 4500 cells and 5000 genes were simulated across
9 dose groups with a probability of being differentially expressed of 10%, 50% of which
were downregulated. (A) Ground truth was used to estimated the False Positive Rate
(FPR), True Positive Rate (TPR), False Negative Rate (FNR), True Negative Rate
(TNR), precision, balanced accuracy, and F1 score. Boxplots and whiskers represent values
for 10 replicate simulations. (B) The area-under the concordance curve (AUCC) was
calculated as previously described (ref) for the 100 most significant genes (K = 100).
Heatmap represents the pairwise AUCC for each DE analysis grouped by similarity.
                                              103


                                                   APPENDIX B
                 SEMIPARAMETRIC DOSE RESPONSE CURVE
             ESTIMATION FOR SINGLE CELL DOSE RESPONSE
                                                 EXPERIMENTS
B.1     Proof of Theorem 1
                               (0)            (0)                (0)             (0)        (0)    (0)
Define Θ(0) = (ϕ(0) , β (0) , γ1,1 , . . . , γ1,δ+K+1 , . . . , γJ,1 , . . . , γJ,δ+K+1 , ψ0 , ψ1 )⊤ . Then,
      ℓ(Θ) = ℓ(Θ(0) ) + ℓ(Θ) − ℓ(Θ(0) )
             = ℓ(Θ(0) )
                      I X   ni                                         !               (                      )
                    X                                             µi,j                       si µi,j + exp(ϕ)
                 +                I(Yi,j > 0) Yi,j log               (0)
                                                                              − Yi,j log        (0)
                     i=1 j=1                                      µi,j                     si µi,j + exp(ϕ(0) )
                                                                                         (0)
                 − exp(ϕ)log{si µi,j + exp(ϕ)} + exp(ϕ(0) )log{si µi,j + exp(ϕ(0) )}
                 +ϕ exp(ϕ) − ϕ(0) exp(ϕ(0) ) + log[Γ{Yi,j + exp(ϕ)}]
                 −log[Γ{Yi,j + exp(ϕ(0) )}]
                 −log[Γ{exp(ϕ)}] + log[Γ{exp(ϕ(0) )}]
                                                   
                                               (0)
                 +log(ωi,j ) − log(ωi,j )
                                                                                         eϕ 
                                                                              exp(ϕ)
                 +I(Yi,j      = 0)log (1 − ωi,j ) + ωi,j
                                                                      si µi,j + exp(ϕ)
                                                                    (                           )eϕ(0) 
                                                                                exp(ϕ(0) )
                                           
                                                    (0)         (0)
                 −I(Yi,j      = 0)log (1 − ωi,j ) + ωi,j                       (0)
                                                                        sj µi,j + exp(ϕ(0) )
For the sake of brevity, define G(ϕ, µi,j ) = [exp(ϕ)/{sj µi,j + exp(ϕ)}]e . Then,
                                                                                                 ϕ
      ℓ(Θ) = ℓ(Θ(0) )
                      I X   ni                                         !               (                      )
                    X                                             µi,j                       si µi,j + exp(ϕ)
                 +                I(Yi,j > 0) Yi,j log               (0)
                                                                              − Yi,j log        (0)
                     i=1 j=1                                      µi,j                     si µi,j + exp(ϕ(0) )
                                                                                         (0)
                 − exp(ϕ)log{si µi,j + exp(ϕ)} + exp(ϕ(0) )log{si µi,j + exp(ϕ(0) )}
                 +ϕ exp(ϕ) − ϕ(0) exp(ϕ(0) ) + log[Γ{Yi,j + exp(ϕ)}]
                                                           104


  −log[Γ{Yi,j + exp(ϕ(0) )}]
  −log[Γ{exp(ϕ)}] + log[Γ{exp(ϕ(0) )}]
                                
                           (0)
  +log(ωi,j ) − log(ωi,j )
                       (                                         )
                             (1 − ωi,j ) + ωi,j G(ϕ, µi,j )
  +I(Yi,j = 0)log                   (0)      (0)            (0)
                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
= ℓ(Θ(0) )
     I X   ni                                     !             (                      )
    X                                         µi,j                   si µi,j + exp(ϕ)
  +            I(Yi,j > 0) Yi,j log            (0)
                                                       − Yi,j log        (0)
    i=1 j=1                                   µi,j                  si µi,j + exp(ϕ(0) )
                  (                          )
                       si µi,j + exp(ϕ)
  − exp(ϕ)log             (0)
                     si µi,j + exp(ϕ(0) )
                        (0)                                          (0)
  − exp(ϕ)log{si µi,j + exp(ϕ(0) )} + exp(ϕ(0) )log{si µi,j + exp(ϕ(0) )}
  +ϕ exp(ϕ) − ϕ(0) exp(ϕ(0) ) + log[Γ{Yi,j + exp(ϕ)}]
  −log[Γ{Yi,j + exp(ϕ(0) )}] − log[Γ{exp(ϕ)}] + log[Γ{exp(ϕ(0) )}]
                                
                           (0)
  +log(ωi,j ) − log(ωi,j )
  +I(Yi,j = 0)
                                                      
                     (0)                           (0)
  × log (1 − ωi,j ){(1 − ωi,j )/(1 − ωi,j )}
                (0)               (0)                      (0)            (0)
              ωi,j G(ϕ(0) , µi,j ){ωi,j G(ϕ, µi,j )/ωi,j G(ϕ(0) , µi,j )}
                                                                             
  +log 1 +                          (0)                         (0)
                           (1 − ωi,j ){(1 − ωi,j )/(1 − ωi,j )}
                                                 
                 (0)         (0)      (0)  (0)
  −log[(1 − ωi,j ) + ωi,j G(ϕ , µi,j )]
  ≥ ℓ(Θ(0) )
     I X   ni                                     !             (                      )
    X                                         µi,j                   si µi,j + exp(ϕ)
  +             I(Yi,j > 0) Yi,j log           (0)
                                                       − Yi,j log        (0)
    i=1 j=1                                   µi,j                  si µi,j + exp(ϕ(0) )
                  (                          )
                       si µi,j + exp(ϕ)
  − exp(ϕ)log             (0)
                     si µi,j + exp(ϕ(0) )
                        (0)                                          (0)
  − exp(ϕ)log{si µi,j + exp(ϕ(0) )} + exp(ϕ(0) )log{si µi,j + exp(ϕ(0) )}
  +ϕ exp(ϕ) − ϕ(0) exp(ϕ(0) ) + log[Γ{Yi,j + exp(ϕ)}]
  −log[Γ{Yi,j + exp(ϕ(0) )}] − log[Γ{exp(ϕ)}] + log[Γ{exp(ϕ(0) )}]
                                         105


                                                 
                                            (0)
                   +log(ωi,j ) −    log(ωi,j )
                                                                   (0)
                                                                                            (               )
                                                        (1 − ωi,j )
                                       
                                                                                                (1 − ωi,j )
                   +I(Yi,j = 0) ×                   (0)         (0)               (0)
                                                                                       log            (0)
                                         (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )                    (1 − ωi,j )
                                (0)             (0)
                                                                     (                           ) 
                             ωi,j G(ϕ(0) , µi,j )                          ωi,j G(ϕ, µi,j )
                   +          (0)       (0)               (0)
                                                               log         (0)              (0)
                                                                                                                 (B.1)
                     (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )                  ωi,j G(ϕ(0) , µi,j )
The last inequality follows by applying the Jensen’s inequality to the term involved with
I(Yi,j = 0). Now, applying the inequality −log(x) ≥ 1 − x for any generic x > 0 we obtain
                     (                           )
                        si µi,j + exp(ϕ)                          si µi,j + exp(ϕ)
               −log         (0)
                                                      ≥1 −           (0)
                       sj µi,j + exp(ϕ(0) )                    sj µi,j + exp(ϕ(0) )
                                                               (0)
                                                        sj (µi,j − µi,j ) + exp(ϕ(0) ) − exp(ϕ)
                                                      =                      (0)
                                                                                                             .   (B.2)
                                                                       sj µi,j + exp(ϕ(0) )
Let us now consider the very last term of (B.1).
            (                      )               (         )             (                     )
               ωi,j G(ϕ, µi,j )                        ωi,j                     G(ϕ, µi,j )
        log    (0)           (0)
                                       = log             (0)
                                                                 + log                     (0)
              ωi,j G(ϕ(0) , µi,j )                     ωi,j                   G(ϕ(0) , µi,j )
                                                   (         )
                                                       ωi,j                              (0)
                                       = log             (0)
                                                                 − log{G(ϕ(0) , µi,j )} + log{G(ϕ, µi,j )}
                                                       ωi,j
                                                   (         )
                                                       ωi,j                              (0)
                                       = log             (0)
                                                                 − log{G(ϕ(0) , µi,j )} + exp(ϕ)ϕ
                                                       ωi,j
                                                             (                                   )
                                                                         si µi,j + exp(ϕ)
                                              − exp(ϕ) log                   (0)
                                                                       sj µi,j + exp(ϕ(0) )
                                                      n                              o
                                                              (0)               (0)
                                              +log sj µi,j + exp(ϕ )
                                                   (         )
                                                       ωi,j                              (0)
                                       = log             (0)
                                                                 − log{G(ϕ(0) , µi,j )} + exp(ϕ)ϕ
                                                       ωi,j
                                                                  (                              )
                                                                       si µi,j + exp(ϕ)
                                              − exp(ϕ)log                  (0)
                                                                     sj µi,j + exp(ϕ(0) )
                                                                  n                             o
                                                                          (0)             (0)
                                              − exp(ϕ)log sj µi,j + exp(ϕ )
                                                   (         )
                                                       ωi,j                              (0)
                                       ≥ log             (0)
                                                                 − log{G(ϕ(0) , µi,j )} + exp(ϕ)ϕ
                                                       ωi,j
                                                             (         (0)
                                                                                                               )
                                                                sj (µi,j − µi,j ) + exp(ϕ(0) ) − exp(ϕ)
                                              + exp(ϕ)                              (0)
                                                                               sj µi,j + exp(ϕ(0) )
                                                           106


                   n                       o
                           (0)
  − exp(ϕ)log sj µi,j + exp(ϕ(0) )
      (        )
         ωi,j                        (0)
= log      (0)
                  − log{G(ϕ(0) , µi,j )} + exp(ϕ)ϕ
         ωi,j
                   n                       o
                           (0)
  − exp(ϕ)log sj µi,j + exp(ϕ(0) )
               (                         )
                           (0)
                  exp(ϕ ) − exp(ϕ)
  + exp(ϕ)              (0)
                   sj µi,j + exp(ϕ(0) )
                     (0)
    sj exp(ϕ)(µi,j − µi,j )
  +        (0)
      sj µi,j + exp(ϕ(0) )
      (        )
         ωi,j                        (0)
= log      (0)
                  − log{G(ϕ(0) , µi,j )} + exp(ϕ)ϕ
         ωi,j
                   n                       o
                           (0)
  − exp(ϕ)log sj µi,j + exp(ϕ(0) )
               (                         )
                  exp(ϕ(0) ) − exp(ϕ)
  + exp(ϕ)              (0)
                   sj µi,j + exp(ϕ(0) )
                                     (0)
    sj {exp(ϕ) − exp(ϕ(0) )}(µi,j − µi,j )
  +                   (0)
                 sj µi,j + exp(ϕ(0) )
                         (0)
    sj exp(ϕ(0) )(µi,j − µi,j )
  +          (0)
        sj µi,j + exp(ϕ(0) )
      (        )
         ωi,j                        (0)
≥ log      (0)
                  − log{G(ϕ(0) , µi,j )} + exp(ϕ)ϕ
         ωi,j
                   n                       o
                           (0)
  − exp(ϕ)log sj µi,j + exp(ϕ(0) )
               (                         )
                  exp(ϕ(0) ) − exp(ϕ)
  + exp(ϕ)              (0)
                   sj µi,j + exp(ϕ(0) )
             0.5sj
  −      (0)
    sj µi,j + exp(ϕ(0) )
                                                
                               (0) 2     (0)   2
  × {exp(ϕ) − exp(ϕ )} + (µi,j − µi,j )
                         (0)
    sj exp(ϕ(0) )(µi,j − µi,j )
  +          (0)
                                                   (B.3)
        sj µi,j + exp(ϕ(0) )
             107


The first inequality in the above derivation is obtained by using (B.2). The second inequality
is obtained by applying the inequality ab ≥ −0.5(a2 + b2 ). Once again using (B.2) we obatin
                    (                          )                     (0)
                         si µi,j + exp(ϕ)                      sj (µi,j − µi,j ) + exp(ϕ(0) ) − exp(ϕ)
       − exp(ϕ)log          (0)
                                                  ≥ exp(ϕ)                     (0)
                       si µi,j + exp(ϕ(0) )                               si µi,j + exp(ϕ(0) )
                                                                     (0)
                                                    sj exp(ϕ)(µi,j − µi,j )
                                                  =        (0)
                                                      si µi,j + exp(ϕ(0) )
                                                        exp(ϕ){exp(ϕ(0) ) − exp(ϕ)}
                                                    +                (0)
                                                                si µi,j + exp(ϕ(0) )
                                                                                       (0)
                                                    sj {exp(ϕ) − exp(ϕ(0 )}(µi,j − µi,j )
                                                  =                  (0)
                                                                 si µi,j + exp(ϕ(0) )
                                                                          (0)
                                                        sj exp(ϕ(0 )(µi,j − µi,j )
                                                    +            (0)
                                                            si µi,j + exp(ϕ(0) )
                                                        exp(ϕ){exp(ϕ(0) ) − exp(ϕ)}
                                                    +                (0)
                                                                si µi,j + exp(ϕ(0) )
                                                                  0.5sj
                                                  ≥−         (0)
                                                        si µi,j + exp(ϕ(0) )
                                                                                                     
                                                                                 (0 2      (0)      2
                                                    × {exp(ϕ) − exp(ϕ )} + (µi,j − µi,j )
                                                                          (0)
                                                        sj exp(ϕ(0 )(µi,j − µi,j )
                                                    +            (0)
                                                            si µi,j + exp(ϕ(0) )
                                                        exp(ϕ){exp(ϕ(0) ) − exp(ϕ)}
                                                    +                (0)
                                                                                            .           (B.4)
                                                                si µi,j + exp(ϕ(0) )
Now, we apply inequalities (B.2), (B.3), and (B.4) to (B.1) and obtain,
         ℓ(Θ) ≥ ℓ(Θ(0) )
                        I X  ni                                    !
                      X                                       µi,j
                   +               I(Yi,j > 0) Yi,j log          (0)
                       i=1 j=1                                µi,j
                           (        (0)
                                                                              )
                              sj (µi,j − µi,j ) + exp(ϕ(0) ) − exp(ϕ)
                   +Yi,j                      (0)
                                          si µi,j + exp(ϕ(0) )
                                                                                               
                                0.5sj                                  (0 2        (0)        2
                   − (0)                        {exp(ϕ) − exp(ϕ )} + (µi,j − µi,j )
                      si µi,j + exp(ϕ(0) )
                                        (0)
                      sj exp(ϕ(0 )(µi,j − µi,j )
                   +           (0)
                          si µi,j + exp(ϕ(0) )
                                                     108


              exp(ϕ){exp(ϕ(0) ) − exp(ϕ)}
            +               (0)
                      si µi,j + exp(ϕ(0) )
                                     (0)                                             (0)
            − exp(ϕ)log{si µi,j + exp(ϕ(0) )} + exp(ϕ(0) )log{si µi,j + exp(ϕ(0) )}
            +ϕ exp(ϕ) − ϕ(0) exp(ϕ(0) ) + log[Γ{Yi,j + exp(ϕ)}]
            −log[Γ{Yi,j + exp(ϕ(0) )}] − log[Γ{exp(ϕ)}]
                                                                         
                                                                    (0)
            +log[Γ{exp(ϕ )}] + log(ωi,j ) − log(ωi,j )
                                 (0)
                                                              (0)
                                                                                   (              )
                                                     (1 − ωi,j )
                                    
                                                                                      (1 − ωi,j )
            +I(Yi,j = 0) ×                      (0)         (0)           (0)
                                                                               log          (0)
                                       (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )             (1 − ωi,j )
                           (0)              (0)              (           )
                        ωi,j G(ϕ(0) , µi,j )                       ωi,j                        (0)
            +            (0)         (0)              (0)
                                                             log      (0)
                                                                             − log{G(ϕ(0) , µi,j )}
              (1 − ωi,j ) + ωi,j G(ϕ , µi,j ) (0)                  ωi,j
                                                n                         o
                                                        (0)           (0)
            + exp(ϕ)ϕ − exp(ϕ)log sj µi,j + exp(ϕ )
                          (                                )
                              exp(ϕ(0) ) − exp(ϕ)
            + exp(ϕ)                (0)
                               sj µi,j + exp(ϕ(0) )
                                                                                             
                        0.5sj                                      (0) 2         (0)        2
            −      (0)
                                            {exp(ϕ) − exp(ϕ )} + (µi,j − µi,j )
              sj µi,j + exp(ϕ(0) )
                                    (0)
              sj exp(ϕ(0) )(µi,j − µi,j )
                                                  
            +           (0)
                  sj µi,j + exp(ϕ(0) )
      = ℓ†1 (ϕ|Θ(0) ) + ℓ∗2 (ψ0 , ψ1 , γ1 , . . . , γI , β|Θ(0) )
            +ℓ∗3 (γ1 , . . . , γI , β|Θ(0) ) + ℓ∗4 (Θ(0) )                                          (B.5)
where
                                I X  ni 
                                                                  (                         )
                                                                             (0)
                                                            
                              X                                      exp(ϕ ) − exp(ϕ)
      ℓ†1 (ϕ|Θ(0) ) =                       I(Yi,j > 0) Yi,j              (0)
                               i=1 j=1                                si µi,j + exp(ϕ(0) )
                                          0.5sj
                              − (0)                         {exp(ϕ) − exp(ϕ(0 )}2
                                 si µi,j + exp(ϕ(0) )
                                 exp(ϕ){exp(ϕ(0) ) − exp(ϕ)}
                              +              (0)
                                         si µi,j + exp(ϕ(0) )
                                                       (0)
                              − exp(ϕ)log{si µi,j + exp(ϕ(0) )}
                              +ϕ exp(ϕ) + log[Γ{Yi,j + exp(ϕ)}]
                                                        
                              −log[Γ{exp(ϕ)}]
                                                    109


                                                        (0)              (0)
                                         I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                
                                    +          (0)       (0)              (0)
                                                                                  exp(ϕ)ϕ
                                       (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                    n                            o
                                                            (0)
                                    − exp(ϕ)log sj µi,j + exp(ϕ(0) )
                                                (                              )
                                                   exp(ϕ(0) ) − exp(ϕ)
                                    + exp(ϕ)            (0)
                                                    sj µi,j + exp(ϕ(0) )
                                                                                               
                                               0.5sj                                    (0) 2
                                    −      (0)
                                                                {exp(ϕ) − exp(ϕ )}                  ,
                                       sj µi,j + exp(ϕ(0) )
                                                X I X ni 
ℓ∗2 (ψ0 , ψ1 , γ1 , . . . , γI , β|Θ(0) )  =                 I(Yi,j > 0)log(ωi,j )
                                                i=1 j=1
                                                +I(Yi,j = 0)
                                                                             (0)
                                                                    (1 − ωi,j )
                                                   
                                                ×               (0)       (0)              (0)
                                                                                               log(1 − ωi,j )
                                                     (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                               (0)             (0)
                                                            ωi,j G(ϕ(0) , µi,j )
                                                                                                         
                                                +            (0)       (0)             (0)
                                                                                             log (ωi,j )
                                                   (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                X I X ni                   
                                           =                 I(Yi,j > 0) ψ0 + ψ1 log(si µi,j )
                                                i=1 j=1
                                                                                                  
                                                −log [1 + exp{ψ0 + ψ1 log(si µi,j )}]
                                                                                                  (0)
                                                                                          (1 − ωi,j )
                                                                      
                                                +I(Yi,j = 0) × −                     (0)        (0)          (0)
                                                                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                ×log [1 + exp{ψ0 + ψ1 log(si µi,j )}]
                                                               (0)             (0)
                                                            ωi,j G(ϕ(0) , µi,j )
                                                +            (0)       (0)             (0)
                                                   (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                   
                                                × ψ0 + ψ1 log(si µi,j )
                                                                                                  
                                                −log [1 + exp{ψ0 + ψ1 log(si µi,j )}]
                                                X I X ni 
                                           =                     I(Yi,j > 0)
                                                i=1 j=1
                                                                      (0)             (0)
                                                    I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                             
                                                +            (0)       (0)             (0)
                                                   (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                × {ψ0 + ψ1 log(si µi,j )}
                                                       110


                                                                                                     
                                                   −log [1 + exp{ψ0 + ψ1 log(si µi,j )}]
                                                   XI X ni 
                                              =                  I(Yi,j > 0)
                                                   i=1 j=1
                                                                       (0)              (0)
                                                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                              
                                                   +         (0)        (0)               (0)
                                                     (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                   × {ψ0 + ψ1 log(si µi,j )}
                                                         "                                               #
                                                             1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                   −log                   (0)       (0)            (0)
                                                           1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                        
                                                                        (0)       (0)           (0)
                                                   −log[1 + exp{ψ0 + ψ1 log(si µi,j )}]
                                                   XI X ni 
                                              ≥                  I(Yi,j > 0)
                                                   i=1 j=1
                                                                       (0)              (0)
                                                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                              
                                                   +         (0)        (0)               (0)
                                                     (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                   × {ψ0 + ψ1 log(si µi,j )}
                                                            1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                   +1 −                  (0)       (0)           (0)
                                                          1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                        
                                                                        (0)       (0)           (0)
                                                   −log[1 + exp{ψ0 + ψ1 log(si µi,j )}] ,
where the last inequality is obtained using the result log(x) ≤ x − 1, so −log(x) ≥ 1 − x.
Now,
                                                       X I X  ni 
         ℓ∗2 (ψ0 , ψ1 , γ1 , . . . , γI , β|Θ(0) ) ≥                   I(Yi,j > 0)
                                                        i=1 j=1
                                                                             (0)              (0)
                                                           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                                     
                                                       +           (0)        (0)              (0)
                                                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                       × {ψ0 + ψ1 log(si µi,j )}
                                                                      (0)       (0)           (0)
                                                             exp{ψ0 + ψ1 log(si µi,j )}
                                                       +                 (0)       (0)            (0)
                                                          1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                               exp{ψ0 + ψ1 log(si µi,j )}
                                                       −                 (0)       (0)            (0)
                                                          1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                           
                                                                              (0)        (0)           (0)
                                                       −log[1 +     exp{ψ0        +   ψ1 log(si µi,j )}]
                                                         111


                                            XI X  ni 
                                          =                 I(Yi,j > 0)
                                            i=1 j=1
                                                                   (0)              (0)
                                               I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                           
                                            +           (0)         (0)              (0)
                                              (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                              
                                            × ψ0 + ψ1 log(si )
                                                   d+K+1
                                                     X                                          
                                                                                          ⊤
                                            +ψ1 {            γi,m Bm (Di,j ) +        Xi,j  β}
                                                     m=1
                                                           (0)        (0)           (0)
                                                 exp{ψ0 + ψ1 log(si µi,j )}
                                            +                 (0)        (0)            (0)
                                              1 + exp{ψ0 + ψ1 log(si µi,j )}
                                              
                                                                          (0)
                                            × 1 − exp{ψ0 − ψ0 + ψ1 log(si µi,j )
                                                                      
                                                (0)            (0)
                                            −ψ1 log(si µi,j )}
                                                                                                  
                                                                    (0)       (0)            (0)
                                            −log[1 +     exp{ψ0         +   ψ1 log(si µi,j )}]      .
Let us work on two specific terms of the above expression. First,
           d+K+1
            X                                                     d+K+1
                                                                    X
                                    ⊤                                                     (0)
       ψ1 {      γi,m Bm (Di,j ) + Xi,j β} = (ψ1 − ψ 1 (0) ){               (γi,m − γi,m )Bm (Di,j )
            m=1                                                     m=1
                                                    ⊤            (0)
                                              +Xi,j   (β   −β        )}
                                                         d+K+1
                                                            X                   (0)
                                              +ψ 1 (0) {           (γi,m − γi,m )Bm (Di,j )
                                                           m=1
                                                    ⊤            (0)
                                              +Xi,j   (β   −β        )}
                                                      d+K+1
                                                        X         (0)                       ⊤ (0)
                                              +ψ1 {             γi,m Bm (Di,j ) + Xi,j         β }
                                                      m=1
                                           ≥ −0.5 (d + K + 2)(ψ1 − ψ (0) )2
                                                 d+K+1
                                                  X                     (0)
                                              +           (γi,m − γi,m )2 Bm      2
                                                                                    (Di,j )
                                                  m=1
                                                                            
                                                     ⊤             (0)    2
                                              +{Xi,j    (β   −β        )}
                                                         d+K+1
                                                            X
                                                    (0)                         (0)
                                              +ψ 1 {               (γi,m − γi,m )Bm (Di,j )
                                                           m=1
                                                    ⊤            (0)
                                              +Xi,j   (β   −β        )}
                                             112


                                                               d+K+1
                                                                 X       (0)               ⊤ (0)
                                                     +ψ1 {             γi,m Bm (Di,j ) + Xi,j β }.
                                                                 m=1
The above inequality is obtained by applying the result, ab ≥ −0.5(a2 + b2 ) repeatedly on
the product terms. The second term,
                            (0)                            (0)          (0)
           − exp{ψ0 − ψ0 + ψ1 log(si µi,j ) − ψ1 log(si µi,j )}
                                                        d+K+1
                                                           X
                            (0)
        = − exp ψ0 − ψ0 + ψ1 log(si ) + ψ1 {                                           ⊤
                                                                  γi,m Bm (Di,j ) + Xi,j β}
                                                           m=1
                                      d+K+1
                                        X                                       
               (0)              (0)            (0)
           −ψ1 log(si )    −  ψ1 {           γi,m Bm (Di,j )     +     ⊤ (0)
                                                                    Xi,j β }
                                       m=1
                   
                            (0)                        (0)                  (0)
        = − exp ψ0 − ψ0 + log(si )(ψ1 − ψ1 ) + (ψ1 − ψ1 )
               d+K+1
                X                 (0)                     ⊤
           ×{         (γi,m − γi,m )Bm (Di,j ) + Xi,j        (β − β (0) )}
                m=1
                    d+K+1
                     X
               (0)                      (0)                     ⊤
           +ψ1 {           (γi,m − γi,m )Bm (Di,j ) + Xi,j        (β − β (0) )}
                     m=1
                            d+K+1
                              X (0)                                    
                       (0)                                     ⊤ (0)
           +(ψ1 − ψ1 ){               γi,m Bm (Di,j ) + Xi,j β }
                              m=1
                   
                            (0)                        (0)                  (0)
        = − exp ψ0 − ψ0 + log(si )(ψ1 − ψ1 ) + (ψ1 − ψ1 )
               d+K+1
                X                 (0)                     ⊤
           ×{         (γi,m − γi,m )Bm (Di,j ) + Xi,j        (β − β (0) )}
                m=1
              d+K+1
               X                 (0)     (0)                   ⊤   (0)
           +         (γi,m − γi,m )ψ1 Bm (Di,j ) + Xi,j          ψ1 (β − β (0) )
               m=1
                            d+K+1
                              X (0)                                    
                       (0)                                     ⊤ (0)
           +(ψ1 −    ψ1 ){            γi,m Bm (Di,j )  +   Xi,j  β }
                              m=1
                             
                     1                           (0)
        ≥ −                     exp{(ψ0 − ψ0 )(2d + 2K + 7)}
              2d + 2K + 7
                                       (0)
           + exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
              d+K+1
               X                       (0)            (0)
           +          exp{(ψ1 − ψ1 )(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}
               m=1
                             (0)      ⊤
           + exp{(ψ1 − ψ1 )Xi,j         (β − β (0) )}(2d + 2K + 7)}
              d+K+1
               X                         (0)  (0)
           +          exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
               m=1
                                                     113


                              ⊤     (0)
               + exp{Xi,j        ψ1 (β − β (0) )(2d + 2K + 7)}
                                              d+K+1
                                                X (0)                                              
                                        (0)                                     ⊤ (0)
               + exp (ψ1 − ψ1 ){                       γi,m Bm (Di,j ) + Xi,j β }(2d + 2K + 7)
                                                m=1
                                        
                            1                               (0)
         ≥ −                              exp{(ψ0 − ψ0 )(2d + 2K + 7)}
                  2d + 2K + 7
                                                 (0)
               + exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
                  d+K+1
                    X                                                                                   
                                                     (0) 2                       (0)                   2
               +             exp 0.5(ψ1 − ψ1 ) + 0.5{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}
                    m=1
                                                                                                
                                            (0) 2              ⊤          (0)                  2
               + exp 0.5(ψ1 − ψ1 ) + 0.5{Xi,j (β − β )(2d + 2K + 7)}
                  d+K+1
                    X                             (0)    (0)
               +             exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
                    m=1
                              ⊤     (0)
               + exp{Xi,j        ψ1 (β − β (0) )(2d + 2K + 7)}
                                              d+K+1
                                                X (0)                                              
                                        (0)                                     ⊤ (0)
               + exp (ψ1 − ψ1 ){                       γi,m Bm (Di,j ) + Xi,j β }(2d + 2K + 7)
                                                m=1
                                        
                            1                               (0)
         ≥ −                              exp{(ψ0 − ψ0 )(2d + 2K + 7)}
                  2d + 2K + 7
                                                 (0)
               + exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
                                                             (0)
               +0.5(d + K + 1) exp{(ψ1 − ψ1 )2 }
                       d+K+1
                         X                             (0)
               +0.5              exp[{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}2 ]
                         m=1
                                            (0)                        ⊤
               +0.5 exp{(ψ1 − ψ1 )2 } + 0.5 exp[{Xi,j                    (β − β (0) )(2d + 2K + 7)}2 ]
                  d+K+1
                    X                             (0)    (0)
               +             exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
                    m=1
                              ⊤     (0)
               + exp{Xi,j        ψ1 (β − β (0) )(2d + 2K + 7)}
                                              d+K+1
                                                X (0)                                              
                                        (0)                                     ⊤ (0)
               + exp (ψ1 − ψ1 ){                       γi,m Bm (Di,j ) + Xi,j β }(2d + 2K + 7) .
                                                m=1
The first and third inequalities follow from the AM-GM inequality and the second inequality
follows from the result that ab ≤ (a2 + b2 )/2. Now,
                                                        XI X  ni 
      ∗                                    (0)
     ℓ2 (ψ0 , ψ1 , γ1 , . . . , γI , β|Θ ) ≥                           I(Yi,j > 0)
                                                        i=1 j=1
                                                                            (0)          (0)
                                                            I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                              
                                                        +          (0)        (0)         (0)
                                                           (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                 114


  
× ψ0 + ψ1 log(si )
−0.5(d + K + 2)(ψ1 − ψ (0) )2
      d+K+1
        X                    (0)
−0.5           (γi,m − γi,m )2 Bm      2
                                          (Di,j )
        m=1
          ⊤
−0.5{Xi,j    (β   − β (0) )}2
          d+K+1
            X
     (0)                          (0)
+ψ 1 {             (γi,m − γi,m )Bm (Di,j )
           m=1
    ⊤            (0)
+Xi,j  (β  −β        )}
       d+K+1
         X                                          
                  (0)                        ⊤ (0)
+ψ1 {           γi,m Bm (Di,j )       +  Xi,j  β }
         m=1
              (0)        (0)           (0)
     exp{ψ0 + ψ1 log(si µi,j )}
+                (0)        (0)            (0)
  1 + exp{ψ0 + ψ1 log(si µi,j )}
  
                   1
× 1−
          2d + 2K + 7
  
                       (0)
× exp{(ψ0 − ψ0 )(2d + 2K + 7)}
                                (0)
+ exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
                                             (0)
+0.5(d + K + 1) exp{(ψ1 − ψ1 )2 }
      d+K+1
        X                             (0)
+0.5           exp[{(γi,m − γi,m )Bm (Di,j )
        m=1
                                                      (0)
×(2d + 2K + 7)}2 ] + 0.5 exp{(ψ1 − ψ1 )2 }
                  ⊤
+0.5 exp[{Xi,j      (β − β (0) )(2d + 2K + 7)}2 ]
  d+K+1
    X                            (0)     (0)
+          exp{(γi,m − γi,m )ψ1 Bm (Di,j )
    m=1
×(2d + 2K + 7)}
            ⊤    (0)
+ exp{Xi,j     ψ1 (β − β (0) )(2d + 2K + 7)}
                           d+K+1
                              X (0)
                     (0)                                  ⊤ (0)
+ exp (ψ1 − ψ1 ){                     γi,m Bm (Di,j ) + Xi,j β }
                       m=1
×(2d + 2K + 7)
                                                     
                      (0)         (0)           (0)
−log[1 +    exp{ψ0         +  ψ1 log(si µi,j )}]
         115


                                                                                     X I
                                       = g1 (ψ0 |Θ0 ) + g2 (ψ1 |Θ0 ) +                    g3,i (γi |Θ0 )
                                                                                     i=1
                                             +g4 (β|Θ0 ) + g5 (Θ0 ),
                             I X ni                                               (0)              (0) 
                           X                                    I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
        g1 (ψ0 |Θ0 ) =                    I(Yi,j > 0) +                  (0)         (0)
                                                                                                            ψ0
                                                              (1  −   ω      ) +   ω     G(ϕ   (0) , µ(0) )
                            i=1 j=1                                      i,j         i,j              i,j
                                          (0)       (0)           (0)
                                exp{ψ0 + ψ1 log(si µi,j )}
                           −                  (0)      (0)           (0)
                              1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                   
                                      1                             (0)
                           ×                      exp{(ψ0 − ψ0 )(2d + 2K + 7)} ,
                              2d + 2K + 7
                     I X ni                                             (0)              (0) 
                    X                                  I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
g2 (ψ1 |Θ0 ) =                   I(Yi,j > 0) +                 (0)          (0)             (0)
                    i=1 j=1                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                      
                    × ψ1 log(si ) − 0.5(d + K + 2)(ψ1 − ψ (0) )2
                          d+K+1
                           X                                           
                                    (0)                     ⊤ (0)
                    +ψ1 {         γi,m Bm (Di,j )     +  Xi,j  β }
                           m=1
                                (0)        (0)          (0)
                        exp{ψ0 + ψ1 log(si µi,j )}
                                                                                            
                                                                                 1
                    −              (0)        (0)          (0)
                                                                   ×
                      1 + exp{ψ0 + ψ1 log(si µi,j )} 2d + 2K + 7
                                              (0)
                    exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
                                                             (0)
                    +0.5(d + K + 1) exp{(ψ1 − ψ1 )2 }
                                            (0)
                    +0.5 exp{(ψ1 − ψ1 )2 }
                                             d+K+1
                                                X (0)                                                          
                                       (0)                                         ⊤ (0)
                    + exp (ψ1 − ψ1 ){                 γi,m Bm (Di,j ) + Xi,j β }(2d + 2K + 7)                     ,
                                                m=1
                             ni                                              (0)              (0)
                                                           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                           X                                                                           
                (0)
     g3,i (γi |Θ ) =                 I(Yi,j > 0) +                 (0)          (0)              (0)
                            j=1                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                      d+K+1
                                         X                   (0)
                           × −0.5                (γi,m − γi,m )2 Bm    2
                                                                          (Di,j )
                                         m=1
                                     d+K+1
                                       X                                        
                                 (0)                       (0)
                           +ψ 1                (γi,m −   γi,m )Bm (Di,j )
                                       m=1
                                                      116


                                                  (0)        (0)            (0)
                                        exp{ψ0 + ψ1 log(si µi,j )}
                            −       −                (0)        (0)            (0)
                                      1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                          d+K+1
                                             1                      X                             (0)
                                    ×                        0.5           exp[{(γi,m − γi,m )Bm (Di,j )
                                      2d + 2K + 7                   m=1
                                    ×(2d + 2K + 7)}2 ]
                                      d+K+1
                                       X                                                                             
                                                                      (0)    (0)
                                    +          exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)} ,
                                       m=1
and
                                     I X  ni                                                (0)             (0) 
                           (0)
                                    X                                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
               g4 (β|Θ ) =                          I(Yi,j > 0) +                   (0)         (0)             (0)
                                    i=1 j=1                               (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                                          
                                                    ⊤             (0) 2           (0) ⊤              (0)
                                    × −0.5{Xi,j (β − β )} + ψ1 Xi,j (β − β )
                                                   (0)        (0)            (0)
                                        exp{ψ0 + ψ1 log(si µi,j )}
                                    −                 (0)        (0)            (0)
                                      1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                           
                                              1                               ⊤
                                    ×                       0.5 exp[{Xi,j        (β − β (0) )(2d + 2K + 7)}2 ]
                                      2d + 2K + 7
                                                                                                  
                                                ⊤ (0)                (0)
                                    + exp{Xi,j ψ1 (β − β )(2d + 2K + 7)} .
                                           I X  ni                     
                                        X                                                               Yi,j sj µi,j
    ℓ∗3 (γ1 , . . . , γI , β|Θ(0) ) =                  I(Yi,j > 0) Yi,j log (µi,j ) −                 (0)
                                         i=1 j=1                                                  si µi,j  + exp(ϕ(0) )
                                                      0.5sj                           (0)
                                        −        (0)
                                                                         (µi,j − µi,j )2
                                            si µi,j  + exp(ϕ(0) )
                                              sj exp(ϕ(0 )µi,j
                                                                         
                                        −        (0)
                                            si µi,j + exp(ϕ(0) )
                                                                 (0)               (0)
                                             I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                        −              (0)         (0)               (0)
                                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                                      sj exp(ϕ(0) )µi,j
                                                                                                                        
                                                         0.5sj                            (0) 2
                                        ×           (0)
                                                                           (µi,j − µi,j ) +               (0)
                                               sj µi,j + exp(ϕ(0) )                                 sj µi,j + exp(ϕ(0) )
                                        X  I X  ni 
                                    =                  I(Yi,j > 0)Yi,j log (µi,j )
                                         i=1 j=1
                                                                                I(Yi,j > 0)sj exp(ϕ(0 )
                                            
                                                I(Yi,j > 0)Yi,j sj
                                        −           (0)
                                                                            +            (0)
                                               si µi,j + exp(ϕ(0) )                 si µi,j + exp(ϕ(0) )
                                                               117


                                                        (0)            (0)
                                       I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )                 sj exp(ϕ(0) )
                                                                                                         
                                   +             (0)     (0)             (0)
                                                                              ×        (0)
                                                                                                          µi,j
                                      (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j ) sj µi,j + exp(ϕ(0) )
                                                                                        (0)          (0)
                                                                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                      
                                          0.5sj I(Yi,j > 0)
                                   −          (0)
                                                                 +            (0)        (0)          (0)
                                         si µi,j + exp(ϕ(0) ) (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                    
                                                0.5sj                          (0) 2
                                   ×       (0)
                                                                 ×(µi,j − µi,j )
                                      sj µi,j + exp(ϕ(0) )
                                   X I X  ni 
                              =                  I(Yi,j > 0)Yi,j log (µi,j )
                                    i=1 j=1
                                                               
                                                   sj
                                   −       (0)
                                                                 I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                                      si µi,j   +   exp(ϕ(0) )
                                                        (0)            (0)
                                       I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                               
                                                                                           (0)
                                   +             (0)     (0)             (0)
                                                                              × exp(ϕ ) µi,j
                                      (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                               
                                                0.5sj
                                   − (0)                         I(Yi,j > 0)
                                      si µi,j + exp(ϕ(0) )
                                                        (0)            (0) 
                                       I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                                   
                                                                                             (0) 2
                                   +             (0)     (0)             (0)
                                                                                (µi,j − µi,j )
                                      (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                          (0)
Now, we obtain two inequalities for −µi,j and −(µi,j − µi,j )2 and apply them in the above
expression. First,
                              d+K+1
                                X
                                                                ⊤
            −µi,j = − exp{              γi,m Bm (Di,j ) + Xi,j    β}
                                m=1
                                     d+K+1
                                      X
                          (0)                            (0)                      ⊤
                   = − µi,j exp{              (γi,m − γi,m )Bm (Di,j ) + Xi,j       (β − β (0) )}
                                      m=1
                               (0)     d+K+1
                             µi,j          X                        (0)
                   ≥−                               exp{(γi,m − γi,m )Bm (Di,j )(d + K + 2)}
                         d+K +2            m=1
                                                                   
                                  ⊤            (0)
                      + exp{Xi,j     (β −β         )(d + K + 2)} ,
where the inequality is obtained by using the AM-GM inequality. Second,
                  (0)            (0)            (0)
       −(µi,j − µi,j )2 = − (µi,j )2 + 2µi,j µi,j − (µi,j )2
                                                               !                       !2
                                 (0)              (0)    µ i,j          (0)       µi,j
                        = − (µi,j )2 + 2(µi,j )2          (0)
                                                                 − (µi,j )2        (0)
                                                         µi,j                    µi,j
                                 (0)
                        = − (µi,j )2
                                                       118


                                                      d+K+1
                                                        X
                                        (0)                              (0)                     ⊤
                                  + 2(µi,j )2  exp{           (γi,m − γi,m )Bm (Di,j ) + Xi,j      (β − β (0) )}
                                                        m=1
                                                       d+K+1
                                                        X
                                      (0)                                 (0)                      ⊤
                                  − (µi,j )2 exp{2            (γi,m − γi,m )Bm (Di,j ) + 2Xi,j       (β − β (0) )}
                                                        m=1
                                      (0)
                              ≥−    (µi,j )2
                                                      d+K+1
                                                         X                                                      
                                        (0)                               (0)                     ⊤        (0)
                                  + 2(µi,j )2   1+             (γi,m −  γi,m )Bm (Di,j )     + Xi,j (β −β      )
                                                         m=1
                                          (0)      d+K+1
                                      (µi,j )2         X                                         (0)
                                  −                          exp{2(d + K + 2)(γi,m − γi,m )Bm (Di,j )}
                                    d+K +2             m=1
                                                                                  
                                                                ⊤          (0)
                                  + exp{2(d + K +          2)Xi,j  (β −β       )} ,
where the inequality is obtained by applying exp(x) ≥ 1 + x for any generic x to the middle
term of the above expression and using the AM-GM inequality to the third term. Now,
applying these two inequalities we obtain,
                                         X I X  ni                       d+K+1
                                                                            X
                                                                                                         ⊤
    ℓ∗3 (γ1 , . . . , γI , β|Θ(0) ) ≥                  I(Yi,j > 0)Yi,j {            γi,m Bm (Di,j ) + Xi,j β}
                                          i=1 j=1                           m=1
                                                                     
                                                        sj
                                         −       (0)
                                                                       I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                                            si µi,j  + exp(ϕ(0) )
                                                               (0)            (0)
                                              I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                                   
                                                                                               (0)
                                         +            (0)       (0)             (0)
                                                                                     × exp(ϕ )
                                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                   (0)      d+K+1
                                                 µi,j          X                        (0)
                                         ×                            exp{(γi,m − γi,m )Bm (Di,j )
                                            d + K + 2 m=1
                                                                                                           
                                                                            ⊤          (0)
                                         ×(d + K + 2)} +           exp{Xi,j    (β   −β     )(d + K + 2)}
                                                                     
                                                      0.5sj
                                         −       (0)
                                                                       I(Yi,j > 0)
                                            si µi,j + exp(ϕ(0) )
                                                               (0)            (0)
                                              I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                    
                                         +            (0)       (0)             (0)
                                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                            
                                                  (0)
                                         × (µi,j )2
                                                             119


                                                           d+K+1
                                                              X
                                              (0)                               (0)
                                      −2(µi,j )2        1+            (γi,m − γi,m )Bm (Di,j )
                                                             m=1
                                           ⊤
                                      +Xi,j    (β − β (0) )
                                                (0)      d+K+1
                                           (µi,j )2         X
                                      +                             exp{2(d + K + 2)
                                         d+K +2             m=1
                                                       (0)
                                      ×(γi,m −       γi,m )Bm (Di,j )}
                                                                                        
                                                                       ⊤         (0)
                                      + exp{2(d + K +            2)Xi,j   (β −β      )}
                                      X K
                               =           ℓ‡3,i (γi |Θ0 ) + ℓ‡4 (β|Θ0 ),
                                       i=1
where
                        Xni                          d+K+1
                                                        X
     ℓ‡3,i (γi |Θ0 ) =         I(Yi,j > 0)Yi,j                γi,m Bm (Di,j )
                        j=1                             m=1
                                                      
                                       sj
                        −      (0)
                                                         I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                          si µi,j   + exp(ϕ(0) )
                                               (0)             (0)
                            I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                     
                                                                                (0)
                        +           (0)          (0)            (0)
                                                                       × exp(ϕ )
                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                  (0)      d+K+1
                               µi,j          X                           (0)
                        ×                             exp{(γi,m − γi,m )Bm (Di,j )(d + K + 2)}
                          d + K + 2 m=1
                                                                                            (0)          (0) 
                                                                             I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                      
                                    0.5sj
                        − (0)                            I(Yi,j > 0) +              (0)       (0)         (0)
                          si µi,j + exp(ϕ(0) )                              (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                          d+K+1
                                             X
                                    (0) 2                         (0)
                        × −2(µi,j )                  (γi,m − γi,m )Bm (Di,j )
                                             m=1
                                (0)        d+K+1
                             (µi,j )2        X                                                            
                                                                                         (0)
                        +                             exp{2(d + K + 2)(γi,m −           γi,m )Bm (Di,j )}     ,
                          d+K +2             m=1
                       XI X  ni 
      ℓ‡4 (β|Θ0 )    =              I(Yi,j > 0)Yi,j Xi,j    ⊤
                                                               β
                       i=1 j=1
                                                     
                                      sj
                       −      (0)
                                                        I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                         si µi,j + exp(ϕ(0) )
                                             (0)              (0)
                           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                    
                                                                               (0)
                       +           (0)          (0)            (0)
                                                                      × exp(ϕ )
                         (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                           120


                               (0)
                             µi,j                    ⊤
                     ×                   exp{Xi,j       (β − β (0) )(d + K + 2)}
                       d+K +2
                                                                                           (0)              (0) 
                                                                          I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                    
                                 0.5sj
                     − (0)                             I(Yi,j > 0) +              (0)        (0)             (0)
                       si µi,j + exp(ϕ(0) )                              (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                      (0)
                                                                    (µi,j )2
                       
                                  (0) 2 ⊤                (0)
                     × −2(µi,j ) Xi,j (β − β ) +                                 exp{2(d + K + 2)
                                                                 d+K +2
                                               
                          ⊤             (0)
                     ×Xi,j (β − β )} .
Combining all results we can write
                                                                                     X I
             ℓ(Θ)     ≥  ℓ†1 (ϕ|Θ(0) )                    (0)
                                          + g1 (ψ0 |Θ ) + g2 (ψ1 |Θ ) +      (0)
                                                                                          ℓ†3,i (γi |Θ(0) )
                                                                                      i=1
                      +ℓ†4 (β|Θ(0) )     +  ℓ†5 (Θ(0) ),
where
                         ℓ†3,i (γi |Θ(0) ) = g3,i (γi |Θ(0) ) + ℓ‡3,i (γi |Θ(0) ),
                            ℓ†4 (β|Θ(0) ) = g4 (β|Θ(0) ) + ℓ‡4 (β|Θ(0) ).
Let us now consider the derivatives.
                                I X  ni 
                                                                   (                             )
                                                                              (0)
                                                             
     ∂ †                ∂     X                                      exp(ϕ ) − exp(ϕ)
       ℓ1 (ϕ|Θ(0) ) =                       I(Yi,j > 0) Yi,j               (0)
    ∂ϕ                 ∂ϕ i=1 j=1                                     si µi,j + exp(ϕ(0) )
                                     0.5sj
                       −        (0)
                                                        {exp(ϕ) − exp(ϕ(0 )}2
                          si µi,j   +  exp(ϕ(0) )
                          exp(ϕ){exp(ϕ(0) ) − exp(ϕ)}
                       +                (0)
                                   si µi,j + exp(ϕ(0) )
                                                  (0)
                       − exp(ϕ)log{si µi,j + exp(ϕ(0) )}
                       +ϕ exp(ϕ) + log[Γ{Yi,j + exp(ϕ)}]
                                                   
                       −log[Γ{exp(ϕ)}]
                                                (0)            (0)
                            I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                     
                       +             (0)         (0)            (0)
                                                                       exp(ϕ)ϕ
                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                           n                          o
                                                    (0)
                       − exp(ϕ)log sj µi,j + exp(ϕ(0) )
                                      (                              )
                                          exp(ϕ(0) ) − exp(ϕ)
                       + exp(ϕ)                 (0)
                                           sj µi,j + exp(ϕ(0) )
                                                          121


                                                                               
                                0.5sj                                    (0) 2
                     −      (0)
                                                  {exp(ϕ) − exp(ϕ )}              ,
                       sj µi,j + exp(ϕ(0) )
                      I X  ni                            (                        )
                     X                                                exp(ϕ)
                 =                I(Yi,j > 0) −Yi,j                (0)
                     i=1 j=1                                  si µi,j + exp(ϕ(0) )
                                   sj
                     −      (0)
                                                  {exp(ϕ) − exp(ϕ(0 )} exp(ϕ)
                       si µi,j  +   exp(ϕ(0) )
                       exp(ϕ){exp(ϕ(0) ) − 2 exp(ϕ)}
                     +               (0)
                                si µi,j + exp(ϕ(0) )
                                            (0)
                     − exp(ϕ)log{si µi,j + exp(ϕ(0) )}
                                                       ∂
                     + exp(ϕ) + ϕ exp(ϕ) +                log[Γ{Yi,j + exp(ϕ)}]
                                                      ∂ϕ
                                                  
                        ∂
                     − log[Γ{exp(ϕ)}]
                       ∂ϕ
                                          (0)             (0) 
                        I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                     +           (0)       (0)             (0)
                                                                   exp(ϕ) + exp(ϕ)ϕ
                       (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                       n                          o
                                              (0)
                     − exp(ϕ)log sj µi,j + exp(ϕ(0) )
                       exp(ϕ){exp(ϕ(0) ) − 2 exp(ϕ)}
                     +               (0)
                                sj µi,j + exp(ϕ(0) )
                                                                                     
                                   sj                                    (0)
                     −      (0)
                                                  {exp(ϕ) − exp(ϕ )} exp(ϕ) ,
                       sj µi,j + exp(ϕ(0) )
                      I X  ni                    
                     X                                     Yi,j exp(ϕ)
                 =                I(Yi,j > 0) − (0)
                     i=1 j=1                          si µi,j + exp(ϕ(0) )
                                                                                    
                        ∂                                      ∂
                     + log[Γ{Yi,j + exp(ϕ)}] −                     log[Γ{exp(ϕ)}]
                       ∂ϕ                                     ∂ϕ
                                                               (0)           (0) 
                                              I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                       
                     + I(Yi, > 0) +                  (0)        (0)           (0)
                                                                                     exp(ϕ) + exp(ϕ)ϕ
                                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                       n                          o
                                              (0)
                     − exp(ϕ)log sj µi,j + exp(ϕ(0) )
                       exp(ϕ){exp(ϕ(0) ) − 2 exp(ϕ)}
                     +               (0)
                                sj µi,j + exp(ϕ(0) )
                                                                                     
                                   sj                                    (0)
                     −      (0)
                                                  {exp(ϕ) − exp(ϕ )} exp(ϕ) ,
                       sj µi,j + exp(ϕ(0) )
Next,
                              I X  ni 
      ∂2 †
                                                        
                 (0)
                           X                                       Yi,j exp(ϕ)
        2
          ℓ1 (ϕ|Θ ) =                    I(Yi,j > 0) − (0)
    ∂ϕ                      i=1 j=1                          si µi,j + exp(ϕ(0) )
                                                    122


                          ∂2                                       ∂2
                                                                                          
                      + 2 log[Γ{Yi,j + exp(ϕ)}] − 2 log[Γ{exp(ϕ)}]
                         ∂ϕ                                       ∂ϕ
                                                                 (0)               (0) 
                                                I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                         
                      + I(Yi, > 0) +                    (0)        (0)              (0)
                                                                                         2 exp(ϕ)
                                               (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                         n                           o
                                                               (0)
                      + exp(ϕ)ϕ − exp(ϕ)log sj µi,j + exp(ϕ(0) )
                         exp(ϕ){exp(ϕ(0) ) − 4 exp(ϕ)}
                      +               (0)
                                 sj µi,j + exp(ϕ(0) )
                                                                                           
                                    sj                                        (0)
                      −      (0)
                                                    {2 exp(ϕ) − exp(ϕ )} exp(ϕ) .
                         sj µi,j + exp(ϕ(0) )
Now,
                                   I X  ni 
                                                                        Yi,j exp(ϕ(0) )
                                                              
      ∂ †      (0)
                                 X
        ℓ1 (ϕ|Θ ) |Θ=Θ(0) =                     I(Yi,j > 0) − (0)
     ∂ϕ                          i=1 j=1                             si µi,j + exp(ϕ(0) )
                                     ∂
                                 +      log[Γ{Yi,j + exp(ϕ)}] |ϕ=ϕ(0)
                                    ∂ϕ
                                                                          
                                     ∂
                                 − log[Γ{exp(ϕ)}] |ϕ=ϕ(0)
                                    ∂ϕ
                                                                             (0)          (0) 
                                                            I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                    
                                 + I(Yi, > 0) +                     (0)        (0)         (0)
                                                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                 
                                   exp(ϕ(0) ) + exp(ϕ(0) )ϕ(0)
                                                       n                            o
                                              (0)             (0)              (0)
                                 − exp(ϕ )log sj µi,j + exp(ϕ )
                                    exp(ϕ(0) ){exp(ϕ(0) ) − 2 exp(ϕ(0) )}
                                 +                   (0)
                                                 sj µi,j + exp(ϕ(0) )
                                                 sj
                                 −       (0)
                                                              {exp(ϕ(0) ) − exp(ϕ(0) )}
                                    sj µi,j + exp(ϕ )     (0)
                                                
                                          (0)
                                 exp(ϕ ) ,
                                   I X  ni 
                                                                        Yi,j exp(ϕ(0) )
                                 X                            
                            =                   I(Yi,j > 0) −            (0)
                                 i=1 j=1                             si µi,j + exp(ϕ(0) )
                                     ∂
                                 +      log[Γ{Yi,j + exp(ϕ)}] |ϕ=ϕ(0)
                                    ∂ϕ
                                                                          
                                     ∂
                                 − log[Γ{exp(ϕ)}] |ϕ=ϕ(0)
                                    ∂ϕ
                                               123


                                                                               (0)          (0) 
                                                            I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                   
                                  + I(Yi, > 0) +                    (0)          (0)          (0)
                                                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                   
                                  × exp(ϕ(0) ) + exp(ϕ(0) )ϕ(0)
                                                        n                              o
                                                               (0)
                                  − exp(ϕ(0) )log sj µi,j + exp(ϕ(0) )
                                          exp(2ϕ(0) )
                                                               
                                  −      (0)
                                    sj µi,j + exp(ϕ(0) )
                                   I X  ni 
   ∂2 †                                                                  Yi,j exp(ϕ(0) )
                                  X                             
              (0)
      ℓ  (ϕ|Θ     ) |Θ=Θ(0)   =                 I(Y  i,j >   0)    −
  ∂ϕ2 1                           i=1 j=1
                                                                          (0)
                                                                      si µi,j + exp(ϕ(0) )
                                      ∂2
                                  +        log[Γ{Yi,j + exp(ϕ)}] |Θ=Θ(0)
                                    ∂ϕ2
                                      ∂2
                                                                              
                                  − 2 log[Γ{exp(ϕ)}] |Θ=Θ(0)
                                    ∂ϕ
                                                                                (0)          (0) 
                                                             I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                    
                                  + I(Yi, > 0) +                     (0)          (0)          (0)
                                                           (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                    
                                  × 2 exp(ϕ(0) ) + exp(ϕ(0) )ϕ(0)
                                                         n                             o
                                              (0)               (0)               (0)
                                  − exp(ϕ )log sj µi,j + exp(ϕ )
                                         3 exp(2ϕ(0) )
                                  −       (0)
                                    sj µi,j + exp(ϕ(0) )
                                        sj exp(2ϕ(0) )
                                                                
                                  −       (0)
                                                                    .
                                    sj µi,j + exp(ϕ(0) )
                           I X ni                                             (0)          (0) 
   ∂                     X                                   I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
      g1 (ψ0 |Θ0 ) =                  I(Yi,j > 0) +                  (0)          (0)          (0)
  ∂ψ0                    i=1 j=1                           (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                      (0)        (0)           (0)
                              exp{ψ0 + ψ1 log(si µi,j )}
                         −               (0)        (0)           (0)
                            1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                          
                                             (0)
                         × exp{(ψ0 −       ψ0 )(2d       + 2K + 7)} ,
                                 I X  ni                                              (0)         (0)
 ∂                              X                                  I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
    g1 (ψ0 |Θ0 ) |Θ=Θ(0) =                  I(Yi,j > 0) +                  (0)          (0)         (0)
∂ψ0                             i=1 j=1                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                               (0)       (0)           (0)
                                     exp{ψ0 + ψ1 log(si µi,j )}
                                                                                   
                                −                 (0)       (0)           (0)
                                                                                     ,
                                  1 + exp{ψ0 + ψ1 log(si µi,j )}
                                              124


                                                              (0)        (0)            (0)
          ∂2                      XI X   ni 
                                                      exp{ψ0 + ψ1 log(si µi,j )}
               g1 (ψ0 |Θ0 ) =                 −
         ∂ψ02                     i=1 j=1
                                                                 (0)        (0)
                                                  1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                                           (0)
                                                                                                           
                                                                               (0)
                                  ×(2d + 2K + 7) exp{(ψ0 −                  ψ0 )(2d       + 2K + 7)} ,
                                                                                     (0)        (0)          (0)
   ∂2
                                     I X   ni 
                                                     (2d + 2K + 7) exp{ψ0 + ψ1 log(si µi,j )}
                                   X                                                                             
       g1 (ψ0 |Θ0 ) |Θ=Θ(0) =                    −                                                                 ,
  ∂ψ02                              i=1 j=1
                                                                             (0)       (0)
                                                            1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                     (0)
                                I  ni                                                (0)             (0) 
 ∂                        ∂ XX                                     I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
    g2 (ψ1 |Θ0 ) =                           I(Yi,j > 0) +                  (0)          (0)             (0)
∂ψ1                     ∂ψ1 i=1 j=1                               (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                           
                        × ψ1 log(si ) − 0.5(d + K + 2)(ψ1 − ψ (0) )2
                               d+K+1
                                 X                                            
                                         (0)                      ⊤ (0)
                        +ψ1 {          γi,m Bm (Di,j )      +  Xi,j  β }
                                 m=1
                                     (0)        (0)          (0)
                             exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                  
                                                                                      1
                        −               (0)        (0)          (0)
                                                                         ×
                           1 + exp{ψ0 + ψ1 log(si µi,j )} 2d + 2K + 7
                                                       (0)
                        × exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
                                                                   (0)
                        +0.5(d + K + 1) exp{(ψ1 − ψ1 )2 }
                                                 (0)
                        +0.5 exp{(ψ1 − ψ1 )2 }
                                                  d+K+1
                                                      X (0)
                                            (0)
                        + exp (ψ1 − ψ1 ){                   γi,m Bm (Di,j )
                                                     m=1
                                                           
                             ⊤ (0)
                        +Xi,j  β }(2d      + 2K + 7)
                          I X ni                                              (0)              (0) 
                        X                                   I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                    =                 I(Yi,j > 0) +                  (0)         (0)              (0)
                        i=1 j=1                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                           
                                                                       (0)
                        × log(si ) − (d + K + 2)(ψ1 − ψ1 )
                            d+K+1
                             X                                           
                                     (0)                      ⊤ (0)
                        +{         γi,m Bm (Di,j )       +  Xi,j β }
                             m=1
                                     (0)        (0)          (0)
                             exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                  
                                                                                      1
                        −               (0)        (0)          (0)
                                                                         ×
                           1 + exp{ψ0 + ψ1 log(si µi,j )} 2d + 2K + 7
                                                                                             (0)
                        (2d + 2K + 7) × log(si ) × exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
                                                              (0)                       (0)
                        +(d + K + 1) exp{(ψ1 − ψ1 )2 } × (ψ1 − ψ1 )
                                                       125


                                         (0)               (0)
                       + exp{(ψ1 − ψ1 )2 }(ψ1 − ψ1 )
                                             d+K+1
                                                X (0)                                                     
                                         (0)                                      ⊤ (0)
                       + exp (ψ1 − ψ1 ){               γi,m Bm (Di,j ) + Xi,j β }(2d + 2K + 7)
                                                m=1
                           d+K+1
                            X                                                           
                                   (0)                   ⊤ (0)
                       ×{         γi,m Bm (Di,j )   + Xi,j  β }(2d       + 2K + 7)           ,
                            m=1
                                   I X  ni                                         (0)            (0) 
    ∂                            X                                   I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
       g2 (ψ1 |Θ0 ) |Θ=Θ(0) =                   I(Yi,j > 0) +                (0)       (0)            (0)
  ∂ψ1                             i=1 j=1                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                  d+K+1
                                                    X (0)                                    
                                 × log(si ) +               γi,m Bm (Di,j ) + Xi,j β⊤ (0)
                                                    m=1
                                               (0)      (0)           (0)
                                       exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                     
                                                                                            1
                                 −                (0)      (0)           (0)
                                                                                 ×
                                    1  + exp{ψ0 + ψ1 log(si µi,j )}                2d + 2K + 7
                                 (2d + 2K + 7) × log(si )
                                     d+K+1
                                       X (0)                                                       
                                                                       ⊤ (0)
                                 +{          γi,m Bm (Di,j ) + Xi,j β }(2d + 2K + 7)
                                       m=1
                                   I X  ni                                        (0)            (0) 
                                 X                                  I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                             =                 I(Yi,j > 0) +                (0)       (0)            (0)
                                  i=1 j=1                         (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                  d+K+1
                                                    X (0)                                    
                                 × log(si ) +               γi,m Bm (Di,j ) + Xi,j β⊤ (0)
                                                    m=1
                                               (0)      (0)           (0)
                                       exp{ψ0 + ψ1 log(si µi,j )}
                                 −                (0)      (0)           (0)
                                    1  + exp{ψ0 + ψ1 log(si µi,j )}
                                                  d+K+1
                                                    X (0)                                    
                                 × log(si ) +               γi,m Bm (Di,j ) + Xi,j β⊤ (0)
                                                    m=1
                                 X I X  ni               d+K+1
                                                            X (0)                                  
                             =               log(si ) +            γi,m Bm (Di,j ) + Xi,j β  ⊤ (0)
                                  i=1 j=1                   m=1
                                                                           (0)             (0)
                                                         I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                    
                                 × I(Yi,j > 0) +                 (0)         (0)            (0)
                                                       (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                               (0)      (0)           (0)
                                       exp{ψ0 + ψ1 log(si µi,j )}
                                                                                 
                                 −                (0)      (0)           (0)
                                    1 + exp{ψ0 + ψ1 log(si µi,j )}
                        I X  ni 
 ∂2                    X                            
     g2 (ψ1 |Θ0 ) =              −(d + K + 2) I(Yi,j > 0)
∂ψ12                   i=1 j=1
                                                 126


                                         (0)               (0)
                         I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                  
                     +         (0)         (0)              (0)
                        (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                  (0)        (0)           (0)
                          exp{ψ0 + ψ1 log(si µi,j )}
                                                                                          
                                                                                 1
                     −               (0)        (0)            (0)
                                                                       ×
                        1 + exp{ψ0 + ψ1 log(si µi,j )} 2d + 2K + 7
                     (2d + 2K + 7)2 × {log(si )}2
                                                    (0)
                     × exp{log(si )(ψ1 − ψ1 )(2d + 2K + 7)}
                                                              (0)                    (0)
                     +2(d + K + 1) exp{(ψ1 − ψ1 )2 } × (ψ1 − ψ1 )2
                                                            (0)
                     +(d + K + 1) exp{(ψ1 − ψ1 )2 }
                                           (0)                  (0)
                     +2 exp{(ψ1 − ψ1 )2 }(ψ1 − ψ1 )2
                                         (0)
                     + exp{(ψ1 − ψ1 )2 }
                                               d+K+1
                                                  X (0)                                                  
                                         (0)                                        ⊤ (0)
                     + exp (ψ1 − ψ1 ){                    γi,m Bm (Di,j ) + Xi,j β }(2d + 2K + 7)
                                                  m=1
                         d+K+1
                          X                                                                 
                                  (0)                       ⊤ (0) 2                       2
                     ×{         γi,m Bm (Di,j )       +  Xi,j   β } (2d     + 2K + 7)          ,
                          m=1
                                   I X   ni 
 ∂2                              X                                   
     g2 (ψ1 |Θ0 ) |Θ=Θ(0) =                     −(d + K + 2) I(Yi,j > 0)
∂ψ12                             i=1 j=1
                                                         (0)             (0) 
                                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                 +             (0)        (0)             (0)
                                    (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                  (0)       (0)          (0)
                                       exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                  
                                                                                             1
                                 −                   (0)        (0)         (0)
                                                                                    ×
                                    1 + exp{ψ0 + ψ1 log(si µi,j )} 2d + 2K + 7
                                 (2d + 2K + 7)2 × {log(si )}2 + d + K + 2
                                      d+K+1
                                       X (0)                                                        
                                                                          ⊤ (0) 2                 2
                                 +{             γi,m Bm (Di,j ) + Xi,j β } (2d + 2K + 7)
                                       m=1
          ∂
              g3,i (γi |Θ(0) )
        ∂γi,m
                ni                                            (0)            (0) 
          ∂ X                               I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
  =                     I(Yi,j > 0) +                (0)         (0)            (0)
        ∂γi,m j=1                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                 d+K+1
                    X                                                  d+K+1
                                                                        X                              
                                     (0) 2 2                       (0)                   (0)
        × −0.5           (γi,m − γi,m ) Bm (Di,j ) + ψ 1                      (γi,m − γi,m )Bm (Di,j )
                    m=1                                                 m=1
                                                   127


                (0)       (0)          (0)
       exp{ψ0 + ψ1 log(si µi,j )}
  −                (0)        (0)         (0)
    1 + exp{ψ0 + ψ1 log(si µi,j )}
                        d+K+1
            1                    X                         (0)
  ×                       0.5          exp[{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}2 ]
    2d + 2K + 7                  m=1
     d+K+1
      X                                                                  
                                  (0)   (0)
  +          exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
      m=1
   ni                                            (0)           (0)
                                  I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
  X                                                                    
=          I(Yi,j > 0) +                (0)         (0)            (0)
  j=1                           (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                            
                     (0)      2               (0)
  × −(γi,m − γi,m )Bm (Di,j ) + ψ1 Bm (Di,j )
                (0)       (0)          (0)
       exp{ψ0 + ψ1 log(si µi,j )}
  −                (0)        (0)         (0)
    1 + exp{ψ0 + ψ1 log(si µi,j )}
                       
            1                                 (0)
  ×                       exp[{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}2 ]
    2d + 2K + 7
                 (0)
  ×(γi,m − γi,m )Bm (Di,j ) × Bm (Di,j ) × (2d + 2K + 7)2
                        (0)     (0)
  + exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
                                           
      (0)
  ×ψ1 Bm (Di,j )(2d + 2K + 7) ,
   ni                                                          (0)       (0) 
                                              I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
  X                 
=      Bm (Di,j ) I(Yi,j > 0) +                        (0)      (0)       (0)
  j=1                                       (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                  
                      (0)                      (0)
  × −(γi,m − γi,m )Bm (Di,j ) + ψ1
                (0)       (0)          (0)
       exp{ψ0 + ψ1 log(si µi,j )}
  −                (0)        (0)         (0)
    1 + exp{ψ0 + ψ1 log(si µi,j )}
    
                            (0)
  × exp[{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}2 ]
                 (0)
  ×(γi,m − γi,m )Bm (Di,j ) × (2d + 2K + 7)
                        (0)     (0)
  + exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
          
      (0)
  ×ψ1         ,
            ∂
                  g3,i (γi |Θ(0) ) |Θ=Θ(0)
          ∂γi,m
                                                128


              ni                                                       (0)          (0)
                                                         I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
             X                     
         (0)
  =    ψ1          Bm (Di,j ) I(Yi,j > 0) +                    (0)      (0)          (0)
             j=1                                        (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                        (0)        (0)         (0)
             exp{ψ0 + ψ1 log(si µi,j )}
                                                         
       −                   (0)        (0)         (0)
          1 + exp{ψ0 + ψ1 log(si µi,j )}
     ∂2
       2
            g3,i (γi |Θ(0) )
   ∂γi,m
             ni                                                         (0)          (0) 
                                                          I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                 
     ∂ X
=                 Bm (Di,j ) I(Yi,j > 0) +                      (0)       (0)          (0)
   ∂γi,m j=1                                             (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                      
                          (0)                      (0)
   × −(γi,m − γi,m )Bm (Di,j ) + ψ1
                   (0)        (0)          (0)
         exp{ψ0 + ψ1 log(si µi,j )}
   −                   (0)        (0)         (0)
     1 + exp{ψ0 + ψ1 log(si µi,j )}
     
                               (0)
   × exp[{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}2 ]
                    (0)
   ×(γi,m − γi,m )Bm (Di,j ) × (2d + 2K + 7)
                            (0)      (0)
   + exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
            
        (0)
   ×ψ1
    ni                                                           (0)          (0) 
                                                  I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
   X                    
=        Bm (Di,j ) I(Yi,j > 0) +                        (0)       (0)         (0)
   j=1                                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                         
   × −Bm (Di,j )
                   (0)        (0)          (0)
         exp{ψ0 + ψ1 log(si µi,j )}
   −                   (0)        (0)         (0)
     1 + exp{ψ0 + ψ1 log(si µi,j )}
     
                               (0)
   × exp[{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}2 ]
                      (0)
   ×2(γi,m − γi,m )2 Bm         3
                                   (Di,j )(2d + 2K + 7)3
                             (0)
   + exp[{(γi,m − γi,m )Bm (Di,j )(2d + 2K + 7)}2 ]
   ×Bm (Di,j ) × (2d + 2K + 7)
                            (0)      (0)
   + exp{(γi,m − γi,m )ψ1 Bm (Di,j )(2d + 2K + 7)}
                                                 
        ( 2
   ×ψ 0)1 Bm (Di,j )(2d + 2K + 7) .
                                              129


                   ∂2
                     2
                         g3,i (γi |Θ(0) ) |Θ=Θ(0)
                ∂γi,m
                 ni                                                               (0)         (0) 
                                                                I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                X                    
          =            Bm (Di,j ) I(Yi,j > 0) +                         (0)         (0)         (0)
                j=1                                           (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                      
                × −Bm (Di,j )
                                 (0)      (0)           (0)
                       exp{ψ0 + ψ1 log(si µi,j )}
                −                   (0)      (0)            (0)
                   1 + exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                       
                                                                    ( 2
                × Bm (Di,j ) × (2d + 2K + 7) + ψ 0)1 Bm (Di,j )(2d + 2K + 7)
                     ni                                                            (0)         (0)
                                                                I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                    X                   
                            2
          = −            Bm    (Di,j )    I(Yi,j > 0) +                  (0)         (0)         (0)
                    j=1                                        (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                        (0)       (0)      (0)
                                                              exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                     
                                                   (   2
                +(2d + 2K + 7){1 + ψ                 0)1 }                 (0)        (0)      (0)
                                                                                                       .
                                                            1 + exp{ψ0 + ψ1 log(si µi,j )}
Also,
                                            ∂2
                                                      g3,i (γi |Θ(0) ) |Θ=Θ(0) = 0.
                                         ∂γi,m γi,r
          ∂ ‡
              ℓ (γi |Θ(0) )
        ∂γi,r 3,i
                ni                         d+K+1
          ∂ X                                 X
      =                I(Yi,j > 0)Yi,j                γi,m Bm (Di,j )
        ∂γi,r j=1                             m=1
                                     
                      sj
        − (0)                           I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
          si µi,j + exp(ϕ )     (0)
                              (0)             (0)
           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                       
                                                                  (0)
        +           (0)         (0)            (0)
                                                      × exp(ϕ )
          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                 (0)      d+K+1
              µi,j          X                            (0)
        ×                            exp{(γi,m − γi,m )Bm (Di,j )(d + K + 2)}
          d + K + 2 m=1
                                                                                (0)         (0) 
                                                              I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                     
                   0.5sj
        − (0)                           I(Yi,j > 0) +                 (0)        (0)         (0)
          si µi,j + exp(ϕ(0) )                               (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                         d+K+1
                            X
                   (0) 2                         (0)
        × −2(µi,j )                 (γi,m − γi,m )Bm (Di,j )
                           m=1
                (0)       d+K+1
             (µi,j )2       X                                                                
                                                                             (0)
        +                            exp{2(d + K + 2)(γi,m −               γi,m )Bm (Di,j )}      ,
          d+K +2           m=1
                                                           130


      Xni 
    =        I(Yi,j > 0)Yi,j Br (Di,j )
      j=1
                                   
                    sj
      −      (0)
                                      I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
        si µi,j + exp(ϕ(0) )
                           (0)             (0)
          I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                 
                                                            (0)
      +           (0)        (0)             (0)
                                                  × exp(ϕ )
        (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
               (0)
             µi,j                          (0)
      ×                  exp{(γi,r − γi,r )Br (Di,j )(d + K + 2)}Br (Di,j )(d + K + 2)
        d+K +2
                                                                     (0)          (0) 
                                                       I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                   
                 0.5sj
      − (0)                           I(Yi,j > 0) +             (0)    (0)         (0)
        si µi,j + exp(ϕ(0) )                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
        
                  (0)
      × −2(µi,j )2 Br (Di,j )
              (0)
           (µi,j )2
                                                                                          
                                                            (0)
      +                  exp{2(d + K + 2)(γi,r − γi,r )Br (Di,j )}2(d + K + 2)Br (Di,j ) ,
        d+K +2
      Xni              
    =      Br (Di,j ) I(Yi,j > 0)Yi,j
      j=1
                                   
                    sj
      −      (0)
                                      I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
        si µi,j + exp(ϕ(0) )
                           (0)             (0)
          I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                 
                                                            (0)
      +           (0)        (0)             (0)
                                                  × exp(ϕ )
        (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
          (0)                     (0)
      ×µi,j exp{(γi,r − γi,r )Br (Di,j )(d + K + 2)}
                                                                     (0)          (0) 
                                                       I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                   
                    sj
      − (0)                           I(Yi,j > 0) +             (0)    (0)         (0)
        si µi,j + exp(ϕ(0) )                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                  
                (0) 2         (0) 2                                 (0)
      × −(µi,j ) + (µi,j ) exp{2(d + K + 2)(γi,r − γi,r )Br (Di,j )} .
So,
                       ∂ ‡
                           ℓ3,i (γi |Θ(0) ) |Θ=Θ(0)
                     ∂γi,r
                    X ni               
              =           Br (Di,j ) I(Yi,j > 0)Yi,j
                     j=1
                                      (0)
                                sj µi,j
                                                 
                    −      (0)
                                                  I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                       si µi,j + exp(ϕ(0) )
                                          (0)          (0)
                         I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                           
                                                                       (0)
                    +            (0)       (0)           (0)
                                                               × exp(ϕ ) .
                       (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                     131


        ∂2 ‡
              ℓ (γ |Θ(0) )
           2 3,i i
       ∂γi,r
                ni              
         ∂ X
     =              Br (Di,j ) I(Yi,j > 0)Yi,j
       ∂γi,r j=1
                                   
                     sj
       − (0)                          I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
         si µi,j + exp(ϕ )   (0)
                            (0)             (0)
           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                    
                                                               (0)
       +           (0)       (0)             (0)
                                                   × exp(ϕ )
         (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
           (0)                   (0)
       ×µi,j exp{(γi,r − γi,r )Br (Di,j )(d + K + 2)}
                                                                          (0)           (0) 
                                                          I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                   
                     sj
       − (0)                          I(Yi,j > 0) +                (0)      (0)          (0)
         si µi,j + exp(ϕ(0) )                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                        
                 (0) 2        (0) 2                                      (0)
       × −(µi,j ) + (µi,j ) exp{2(d + K + 2)(γi,r − γi,r )Br (Di,j )}
        ni                                            
       X                                sj
     =      Br (Di,j ) −          (0)
                                                         I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
       j=1                   si µi,j  +  exp(ϕ(0) )
                            (0)             (0)
           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                    
                                                               (0)
       +           (0)       (0)             (0)
                                                   × exp(ϕ )
         (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
           (0)                                                 (0)
       ×µi,j Br (Di,j )(d + K + 2) exp{(γi,r − γi,r )Br (Di,j )(d + K + 2)}
                                                                          (0)           (0) 
                                                          I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                   
                     sj
       − (0)                          I(Yi,j > 0) +                (0)      (0)          (0)
         si µi,j + exp(ϕ(0) )                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                                    
                                           (0)                                     (0)
       ×2(d + K +        2)Br (Di,j )(µi,j )2    exp{2(d + K + 2)(γi,r −          γi,r )Br (Di,j )}    .
Now,
         ∂2 ‡
               ℓ (γ |Θ(0) ) |Θ=Θ(0)
            2 3,i i
        ∂γi,r
                             ni                                        
                            X
                                    2                      sj
     = −(d + K + 2)              Br (Di,j )          (0)
                                                                          I(Yi,j > 0)Yi,j
                            j=1                 si µi,j + exp(ϕ(0) )
                                                             (0)           (0)
                                           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                                
                                  (0                                                        (0)    (0)
        +I(Yi,j > 0) exp(ϕ ) +                      (0)       (0)            (0)
                                                                                 × exp(ϕ ) µi,j
                                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                        (0)                                                (0)           (0) 
                2sj (µi,j )2                               I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                    
        + (0)                          I(Yi,j > 0) +                (0)      (0)          (0)
           si µi,j + exp(ϕ(0) )                           (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                 ∂2
                                         ℓ‡ (γi |Θ(0) ) |Θ=Θ(0) = 0.
                            ∂γi,r γi,m 3,i
                                                   132


                      I    ni                                             (0)             (0) 
 ∂        (0)    ∂ XX                                     I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
   g4 (β|Θ ) =                      I(Yi,j > 0) +                 (0)         (0)             (0)
∂β              ∂β i=1 j=1                              (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                  
                               ⊤            (0) 2          (0) ⊤              (0)
                × −0.5{Xi,j (β − β )} + ψ1 Xi,j (β − β )
                              (0)       (0)           (0)
                     exp{ψ0 + ψ1 log(si µi,j )}
                −                (0)       (0)           (0)
                  1 + exp{ψ0 + ψ1 log(si µi,j )}
                                     
                          1                            ⊤
                ×                     0.5 exp[{Xi,j       (β − β (0) )(2d + 2K + 7)}2 ]
                  2d + 2K + 7
                                                                           
                            ⊤ (0)              (0)
                + exp{Xi,j ψ1 (β − β )(2d + 2K + 7)} .
                 I X  ni                                             (0)             (0) 
                X                                    I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
              =                I(Yi,j > 0) +                 (0)        (0)             (0)
                i=1 j=1                             (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                  
                           ⊤           (0)                 (0)
                × −{Xi,j (β − β )}Xi,j + ψ1 Xi,j
                              (0)       (0)           (0)
                     exp{ψ0 + ψ1 log(si µi,j )}
                −                (0)       (0)           (0)
                  1 + exp{ψ0 + ψ1 log(si µi,j )}
                                     
                          1                        ⊤
                ×                     exp[{Xi,j      (β − β (0) )(2d + 2K + 7)}2 ]
                  2d + 2K + 7
                     ⊤
                ×Xi,j   (β − β (0) )Xi,j (2d + 2K + 7)2
                            ⊤    (0)
                + exp{Xi,j     ψ1 (β − β (0) )(2d + 2K + 7)}
                                                   
                    (0)
                ×ψ1 Xi,j (2d + 2K + 7) .
                 I X  ni                                                     (0)             (0)
                                                            I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                X                                                                                
              =           Xi,j       I(Yi,j > 0) +                 (0)         (0)             (0)
                i=1 j=1                                   (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                    
                         ⊤           (0)         (0)
                × −Xi,j (β − β ) + ψ1
                              (0)       (0)           (0)
                     exp{ψ0 + ψ1 log(si µi,j )}
                −                (0)       (0)           (0)
                  1 + exp{ψ0 + ψ1 log(si µi,j )}
                  
                              ⊤                                             ⊤
                × exp[{Xi,j      (β − β (0) )(2d + 2K + 7)}2 ]Xi,j              (β − β (0) )(2d + 2K + 7)
                                                                                      
                            ⊤ (0)              (0)                                (0)
                + exp{Xi,j ψ1 (β − β )(2d + 2K + 7)} × ψ1                                .
                                                133


                                                     I X    ni         
            ∂        (0)                       (0)
                                                    X
              g4 (β|Θ ) |Θ=Θ(0) = ψ1                            Xi,j I(Yi,j > 0)
           ∂β                                       i=1 j=1
                                                                      (0)              (0)
                                                 I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                            +              (0)         (0)              (0)
                                                (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                              (0)        (0)           (0)
                                                   exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                  
                                            −                    (0)        (0)            (0)
                                                                                                     .
                                                1 + exp{ψ0 + ψ1 log(si µi,j )}
                          I X  ni                                                          (0)            (0) 
  ∂2                                                                    I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                         X               
               (0)
       g4 (β|Θ ) =                  Xi,j       I(Yi,j > 0) +
∂β∂β ⊤                   i=1 j=1
                                                                                (0)          (0)
                                                                      (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                                           (0)
                                   ⊤
                                      
                         × −Xi,j
                                        (0)        (0)            (0)
                              exp{ψ0 + ψ1 log(si µi,j )}
                         −                 (0)        (0)             (0)
                           1 + exp{ψ0 + ψ1 log(si µi,j )}
                           
                                        ⊤
                         × exp[{Xi,j       (β − β (0) )(2d + 2K + 7)}2 ]
                                  ⊤                                                 ⊤
                         ×2{Xi,j    (β − β (0) )}2 (2d + 2K + 7)3 Xi,j
                                       ⊤                                                ⊤
                         + exp[{Xi,j     (β − β (0) )(2d + 2K + 7)}2 ]Xi,j                  (2d + 2K + 7)
                                      ⊤    (0)                                              (0)
                         + exp{Xi,j      ψ1 (β − β (0) )(2d + 2K + 7)}(ψ1 )2
                                                        
                              ⊤
                         ×Xi,j (2d + 2K + 7)
                           X I X  ni                
                                                 ⊤
                     = −               Xi,j Xi,j         I(Yi,j > 0)
                            i=1 j=1
                                               (0)                (0)
                             I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                         
                         +           (0)         (0)               (0)
                           (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                             (0)        (0)           (0)
                           (2d + 2K + 7) exp{ψ0 + ψ1 log(si µi,j )}
                         +                          (0)        (0)            (0)
                                   1 + exp{ψ0 + ψ1 log(si µi,j )}
                           
                                        ⊤
                         × exp[{Xi,j       (β − β (0) )(2d + 2K + 7)}2 ]
                                  ⊤
                         ×2{Xi,j    (β − β (0) )}2 (2d + 2K + 7)2
                                       ⊤
                         + exp[{Xi,j     (β − β (0) )(2d + 2K + 7)}2 ]
                                                                                                       
                                      ⊤ (0)               (0)                                    (0)
                         + exp{Xi,j      ψ1 (β      −β        )(2d + 2K + 7)} ×              (ψ1 )2      .
                                                 134


                                                 I X   ni
           ∂2                                  X                        
                      (0)                                            ⊤
              g4 (β|Θ ) |Θ=Θ(0) = −                         Xi,j Xi,j I(Yi,j > 0)
         ∂β 2                                   i=1 j=1
                                                                    (0)               (0)
                                                 I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                            +             (0)        (0)               (0)
                                               (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                  (0)
                                            +(2d + 2K + 7){1 + (ψ1 )2 }
                                                             (0)       (0)            (0)
                                                  exp{ψ0 + ψ1 log(si µi,j )}
                                                                                                 
                                            ×                   (0)       (0)             (0)
                                                                                                   .
                                               1 + exp{ψ0 + ψ1 log(si µi,j )}
                        I    ni 
 ∂ ‡               ∂ XX                                   ⊤
   ℓ (β|Θ0 ) =                     I(Yi,j > 0)Yi,j Xi,j      β
∂β 4              ∂β i=1 j=1
                                            
                                sj
                  − (0)                       I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                    si µi,j + exp(ϕ )   (0)
                                      (0)             (0)
                     I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                
                                                                           (0)
                  +           (0)       (0)            (0)
                                                              × exp(ϕ )
                    (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                           (0)
                         µi,j               ⊤
                  ×                exp{Xi,j   (β − β (0) )(d + K + 2)}
                    d+K +2
                                                                                          (0)           (0) 
                                                                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                            
                             0.5sj
                  − (0)                       I(Yi,j > 0) +                    (0)         (0)           (0)
                    si µi,j + exp(ϕ(0) )                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                    
                             (0)      ⊤
                  × −2(µi,j )2 Xi,j     (β − β (0) )
                          (0)
                       (µi,j )2
                                                                                     
                                                                 ⊤            (0)
                  +                exp{2(d + K + 2)Xi,j (β − β )}
                    d+K +2
                  XI X  ni 
           . =                 I(Yi,j > 0)Yi,j Xi,j
                  i=1 j=1
                                            
                                sj
                  −      (0)
                                              I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                    si µi,j + exp(ϕ(0) )
                                      (0)             (0)
                     I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                
                                                                           (0)
                  +           (0)       (0)            (0)
                                                              × exp(ϕ )
                    (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                           (0)
                         µi,j               ⊤
                  ×                exp{Xi,j   (β − β (0) )(d + K + 2)} × Xi,j (d + K + 2)
                    d+K +2
                                                                                          (0)           (0) 
                                                                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                            
                             0.5sj
                  − (0)                       I(Yi,j > 0) +                    (0)         (0)           (0)
                    si µi,j + exp(ϕ(0) )                            (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                  (0)
                                              (µi,j )2
                    
                             (0) 2                                                            ⊤
                  × −2(µi,j ) Xi,j +                         exp{2(d + K + 2)Xi,j               (β − β (0) )}
                                            d+K +2
                                               135


                                                
                  ×2(d + K + 2)Xi,j
                  XI X  ni        
            . =              Xij I(Yi,j > 0)Yi,j
                  i=1 j=1
                                                 
                                sj
                  −      (0)
                                                    I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                    si µi,j + exp(ϕ(0) )
                                         (0)               (0)
                     I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                 
                                                                            (0)
                  +           (0)          (0)              (0)
                                                                 × exp(ϕ )
                    (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                      (0)            ⊤
                  ×µi,j exp{Xi,j       (β − β (0) )(d + K + 2)}
                                  (0)                                                    (0)       (0) 
                           sj (µi,j )2                                   I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                 
                  − (0)                             I(Yi,j > 0) +               (0)        (0)       (0)
                    si µi,j + exp(ϕ(0) )                               (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                                
                                                               ⊤          (0)
                  × −1 + exp{2(d + K + 2)Xi,j (β − β )}
            .
                               I X  ni         
 ∂ ‡                         X
   ℓ (β|Θ0 ) |Θ=Θ(0) =                   Xij I(Yi,j > 0)Yi,j
∂β 4                         i=1 j=1
                                                (0)
                                          sj µi,j
                                                            
                             − (0)                             I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                                si µi,j + exp(ϕ )      (0)
                                                      (0)            (0)
                                 I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                            
                                                                                       (0)
                             +             (0)         (0)            (0)
                                                                            × exp(ϕ ) .
                                (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                         I X  ni
  ∂2
                                         
        ‡
                       X                                    sj
     ⊤
       ℓ4 (β|Θ0 ) =                Xi,j − (0)
∂β∂β                   i=1 j=1                  si µi,j + exp(ϕ(0) )
                       
                          I(Yi,j > 0)Yi,j
                       +I(Yi,j > 0) exp(ϕ(0 )
                                                (0)             (0)
                           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                      
                                                                                  (0)
                       +            (0)          (0)             (0)
                                                                       × exp(ϕ )
                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                            (0)             ⊤                                      ⊤
                       ×µi,j exp{Xi,j          (β − β (0) )(d + K + 2)}Xi,j          (d + K + 2)
                                        (0)
                                 sj (µi,j )2
                                                       
                       − (0)                              I(Yi,j > 0)
                          si µi,j + exp(ϕ(0) )
                                                (0)             (0) 
                           I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                       +            (0)          (0)             (0)
                          (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                     136


                                                                                                 
                                                      ⊤            (0)                          ⊤
                  × exp{2(d + K + 2)Xi,j (β − β )} × 2(d + K + 2) × Xi,j
                                          I X   ni             
                                        X
                                                            ⊤                  sj
           = −(d + K + 2)                           Xi,j Xi,j           (0)
                                        i=1 j=1                    si µi,j  + exp(ϕ(0) )
                  
                    I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                                         (0)              (0)
                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                                   
                                                                               (0)
                  +            (0)         (0)             (0)
                                                                  × exp(ϕ )
                    (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                      (0)             ⊤
                  ×µi,j exp{Xi,j        (β − β (0) )(d + K + 2)}
                                   (0)
                            sj (µi,j )2
                                                 
                  + (0)                            I(Yi,j > 0)
                    si µi,j + exp(ϕ(0) )
                                         (0)              (0) 
                      I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                  +            (0)         (0)             (0)
                    (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                                                           
                                                        ⊤             (0)
                  × 2 exp{2(d + K + 2)Xi,j (β − β )} .
     ∂2
         ⊤
            ℓ‡4 (β|Θ0 ) |Θ=Θ(0)
  ∂β∂β
                         I X  ni                               (0)
                                                          sj µi,j
                       X                        
                                             ⊤
= −(d + K + 2)                      Xi,j Xi,j         (0)
                       i=1 j=1                    si µi,j + exp(ϕ(0) )
  
   I(Yi,j > 0)Yi,j + I(Yi,j > 0) exp(ϕ(0 )
                        (0)               (0)
     I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                                                                   
                                                              (0)
  +           (0)         (0)               (0)
                                                  × exp(ϕ )
    (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                  (0)                                                       (0)        (0) 
         2sj (µi,j )2                                    I(Yi,j = 0)ωi,j G(ϕ(0) , µi,j )
                               
  + (0)                           I(Yi,j > 0) +                   (0)        (0)         (0)
                                                                                              .
    si µi,j + exp(ϕ(0) )                               (1 − ωi,j ) + ωi,j G(ϕ(0) , µi,j )
                                              137


                                            APPENDIX C
                 KERNELIZED SIGNED GRAPH LEARNING FOR
                 SINGLE CELL GENE REGULATORY NETWORK
                                              INFERENCE
C.1      Optimization Algorithm for Signed Graph Learning
In this section, we present an ADMM based algorithm to solve the optimization problem for
signed graph learning. For convenience, we include the optimization problem below:
                     min. trace(KL+ ) − trace(KL− ) + α1 ∥L+ ∥2F + α2 ∥L− ∥2F
                   L+ ,L− ∈L
                     s.t.    trace(L+ ) = 2n, trace(L− ) = 2n                            (C.1)
                               ij = 0 if Lij ̸= 0 and Lij = 0 if Lij ̸= 0 ∀i ̸= j.
                                          −            −
                            L+                                    +
This problem is non-convex due to the last two constraints, which are called complementarity
constraints [157]. In [158], it is shown that alternating direction method of multipliers
(ADMM) converges for problems with complementarity constraints under some assumptions.
First, we rewrite the problem in vector form. Let upper(·) be an operator that takes an n × n
matrix and returns a n(n − 1)/2-dimensional vector that corresponds to the upper triangular
part of the input matrix. Define diag(x) as an operator which returns a diagonal matrix
with the diagonal elements equal to the input vector x. Similarly, diag(X) returns the
diagonal of the input matrix X as a vector. The matrix P ∈ Rn×n(n−1)/2 is defined such that
Pupper(A) = A1 where A is a symmetric matrix whose diagonal entries are equal to zero.
Let k = upper(K), d = diag(K), ℓ+ = upper(L+ ), ℓ− = upper(L− ). Thus, (C.1) can be
rewritten as:
               min.       ⟨2k − P⊤ d, ℓ+ ⟩ − ⟨2k − P⊤ d, ℓ− ⟩ + α1 ⟨(2I + P⊤ P)ℓ+ , ℓ+ ⟩
            ℓ+ ≤0, ℓ− ≤0
                          + α2 ⟨(2I + P⊤ P)ℓ− , ℓ− ⟩                                     (C.2)
                s.t.    1⊤ ℓ+ = −n, 1⊤ ℓ− = −n, and ℓ+ ⊥ℓ− ,
                                                   138


where the first two terms correspond to trace terms in (C.1), the last two terms correspond
to Frobenius terms of (C.1) and first two constraints are the same as the first two constraints
of (C.1). The last constraint with ℓ+ ≤ 0 and ℓ− ≤ 0 correspond to the complementarity
constraints. By introducing two slack variables v = ℓ+ and w = ℓ− , the problem is written
in standard ADMM form:
                                min. ıS (v, w) + h(ℓ+ , ℓ− ) + ıH (ℓ+ ) + ıH (ℓ− )
                             v,w,ℓ+ ,ℓ−
                                                                                             (C.3)
                                                             −
                                 s.t.          +
                                         v − ℓ = 0, w − ℓ = 0,
where ıS (·) is the indicator function for the complementarity set S = {(v, w) : v ≤ 0, w ≤
0, v⊥w}, h(ℓ+ , ℓ− ) is the objective function in (C.2), and ıH () is the indicator function for
the hyperplane H = {ℓ : 1⊤ ℓ = −n}. The augmented Lagrangian of (C.3) is:
   Lρ (v, w, ℓ+ , ℓ− , λ1 , λ2 ) =ıS (v, w) + h(ℓ+ , ℓ− ) + ıH (ℓ+ ) + ıH (ℓ− )
                                                     ρ                            −  ρ   − 2
                                        1 (v − ℓ ) + ∥v − ℓ ∥2 + λ2 (w − ℓ )+ ∥w − ℓ ∥2 , (C.4)
                                                 +             + 2
                                    + λ⊤                                 ⊤
                                                     2                               2
where λ1 and λ2 are Lagrange multipliers and ρ > 0 is the Augmented Lagrangian parameter.
    (v, w)-step: The (v, w)-step of ADMM can be found as the projection onto the comple-
mentarity set S:
                                             ρ         k    λk       ρ          k   λk
  (vk+1 , wk+1 ) = argmin ıS (v, w) + ∥v − ℓ+ + 1 ∥22 + ∥w − ℓ− + 2 ∥22 = ΠS (y), (C.5)
                        v,w                  2              ρ        2              ρ
                   k                    k
where y = [(ℓ+ − λk1 /ρ)⊤ , (ℓ− − λk2 /ρ)⊤ ]⊤ and ΠS (·) is the projection operator on the set
S.
    (ℓ+ , ℓ− )-step: Using the fact that optimization can be performed separately for ℓ+ and
ℓ− , ℓ+ -step can be written as:
        k+1                                                                ρ           λk
     ℓ+     = argmin z⊤ ℓ+ + α1 ⟨(2I + P⊤ P)ℓ+ , ℓ+ ⟩ + ıH (ℓ+ ) + ∥vk+1 − ℓ+ + 1 ∥22
                                                                           2           ρ
                  ℓ+                                                                         (C.6)
                                             ⊤    −1    k+1
            = ΠH [((4α1 + ρ)I + 2α1 P P) (ρv                +  λk1 − z)],
where z = 2k − P⊤ d and ΠH (·) is the projection operator on the hyperplane H. Similarly,
ℓ− -step can be written as:
                           k+1
                       ℓ−       = ΠH [((4α2 + ρ)I + 2α2 P⊤ P)−1 (ρwk+1+ λk2 + z)].           (C.7)
                                                      139


Lagrange multipliers udpate: The updates of Lagrange multipliers are:
                                                            k+1
                                   λk+1
                                    1   = λk1 + ρ(vk+1 − ℓ+     ),                        (C.8)
                                                             k+1
                                   λk+1
                                    2   = λk2 + ρ(wk+1 − ℓ−      ).                       (C.9)
C.1.1      Computational and Storage Complexity
Computational complexity of the optimization procedure described above can be found by
determining how many computations are required for each ADMM step. Let M = n(n−1)/2
where n is the number of genes. (v, w)-step can be performed in O(M ) time, or O(n2 ) time.
(ℓ+ , ℓ− )-step requires the inversion of the matrix (4α1 + ρ)I + 2α1 P⊤ P, which needs to be
calculated only once before the optimization iterations. The inverse matrix has a closed form
solution which can be found using Woodbury matrix identity. It has a decomposition of the
form A⊤ A where A is a sparse matrix with O(n2 ) non-zero entries. Thus, matrix-vector
multiplication of (ℓ+ , ℓ− )-step can be done in O(n2 ) time. Updates of Lagrangian multipliers
can also be performed O(M ) time, or O(n2 ). Let I be the number of iterations required for
the convergence of ADMM. Thus, overall time complexity of scSGL is O(In2 ). The storage
complexity of scSGL is determined by the size of the inverse matrix required in (ℓ+ , ℓ− )-step.
Since this matrix has a decomposition of the form A⊤ A, we only need to store A. Thus,
the storage complexity of scSGL is O(n2 ).
    The computational and storage complexity of ADMM is quadratic in the number of genes
and is not affected by the number of cells. Note that, scSGL also requires the construction
of the kernel matrix before running the optimization. Since there are already very efficient
tools to construct kernel matrices [130], we did not include their complexity in the analysis
above. Finally, there are recent works in GSP literature for scaling GL methods to learning
graphs with millions of nodes [159]. These approaches can be employed to scale scSGL,
which we left as a future pursuit.
                                                140


C.2                                 AUROC and EPR Results
In the main text, we consider AUPRC based metrics defined above as the main performance
metrics for comparison due to inherent sparsity of GRNs. In this section, AUROC and EPR
values for synthetic data used in parameter sensitivity analysis are reported in Figures C.1
and C.2, respectively. AUROC and EPR ratios for real datasets are also reported in Figure
C.3.
                                                           

                      hESC AUROC Ratio            mESC AUROC Ratio                                          hESC EPR Ratio               mESC EPR Ratio
    GENIE3       0.921     0.946     1.236   0.985     1.066     1.156   High           GENIE3      0.959       1.778    3.457   1.054       3.237    3.268   High
 GRNBOOST2       1.015     0.989     1.160   1.016     1.043     1.125            GRNBOOST2         0.941       1.560    3.194   1.049       3.062    3.290
      PIDC       1.012     0.992     1.241   1.011     1.040     1.114                    PIDC      0.914       1.904    3.750   1.001       2.527    3.179
     PPCOR       1.000     1.000     1.000   1.000     1.000     1.000                  PPCOR       1.000       1.000    1.000   1.000       1.000    1.000
    scSGL-r 0.942          1.047     1.208   1.011     1.056     1.121                 scSGL-r 1.006            1.905    3.096   1.005       2.011    2.673
   scSGL-        0.947     1.018     1.151   1.020     1.066     1.155                scSGL-        0.910       1.868    4.414   1.050       1.253    1.366
  scSGL-    zi   0.958     1.049     1.173   1.009     1.055     1.121   Low          scSGL-   zi   0.991       1.850    3.398   0.999       1.944    2.364   Low
                   ic         ific    G        ic         ific    G                                   ic          ific       G     ic          ific       G
                 ec if     ec        RIN     ec if      ec       RIN                                ec if       ec       RIN     ec if       ec       RIN
                 Sp       np         ST      Sp       np         ST                                 Sp         np        ST      Sp         np        ST
                         No                          No                                                      No                           No
Figure C.3 AUROC and EPR ratios of methods for two real-world scRNAseq datasets.
Inferred graphs are compared to three different gene regulatory databases.
realizations of an expression data with 500 cells. Compared to curated datasets analyzed in
the main text, these datasets do not include any dropouts. We calculated AUPRC ratios
for activating and inhibitory edges separately and average of 10 realizations are reported in
Figure C.4.
    The figure indicates that scSGL along with PPCOR are the best performing methods for
inference of activating edges. For BF, BFC and CY; PPCOR followed by scSGL-r have the
highest AUPRC ratios. On the other hand, scSGL-r followed by other kernels and PPCOR
have the highest performances for LI, LL and TF. For the inference of inhibitory edges,
the best performing method varies across networks. In BF, GRNBOOST2 shows the best
performance; in BFC and TF, scSGL-τ and GRNBOOST2 have the highest AUPRC ratios;
and for the remaining datasets scSGL-r followed by PPCOR perform better than others.
In [3], it is observed that methods perform well on linear networks (LI and LL); while the
inference in the remaining networks is harder. AUPRC ratios reported in Figure C.4 are
inline with this observation, where AUPRC ratios for linear networks are generally higher
than those for BF, BFC, CY and TF. Overall, scSGL-r along with PPCOR are the best
performing methods, if the results on activating and inhibitory edges are evaluated together.
Finally, when performances of kernels are compared, it can be seen that correlation kernel
                                                                                143


shows the highest performance, followed by zero-inflated Kendall.
                               Synthetic Activating                      Synthetic Inhibitory
                                                                                                         High
            GENIE3   1.87 2.84    2.76     2.90     6.40 3.88  1.76 2.00    1.70     6.74     78.70 1.45
        GRNBOOST2    1.37 3.23    2.93     2.74     7.05 2.97  2.54 2.18    1.90     4.54     51.36 1.59
               PIDC  1.92 2.22    3.22     2.29     7.00 3.16  1.81 1.62    1.93     8.42     31.29 1.48
             PPCOR   3.58 4.03    5.21     3.25     7.50 5.08  1.86 1.68    3.19    13.43 79.07     1.36
           scSGL-r   3.25 3.48    4.92     3.36     8.65 5.08  1.73 2.05    3.17    15.73 93.28     1.47
           scSGL-    2.75 3.29    4.28     3.30     8.48 4.12  2.10 1.93    2.89     4.40     1.00  1.41
          scSGL-  zi 3.39 3.39    4.83     3.02     8.10 5.04  1.91 2.22    3.14    12.45 29.62     1.56
                                                                                                         Low
                      BF  BFC      CY       LI       LL   TF    BF  BFC      CY       LI       LL    TF
Figure C.4 Performance of scSGL and state-of-the-art methods on curated datasets as
measured by AUPRC ratios for activating and inhibitory edges. Each column corresponds
to a synthetic network. Abbreviations: LI, linear; CY, cycle; LL, linear long; BF,
bifurcating; BFC, bifurcating converging and TF, trifurcating.
C.4      Cell-Type Specific GRN Inference
    scSGL is developed based on the assumption that all cells are related to a single GRN.
However, single cell datasets are generally a combination of cells arising from varying cell-
types, and therefore may necessitate the inference of cell type specific GRNs. Cell-type
specific GRN’s can be learned in our framework by adding a cell-type clustering step before
applying scSGL. One could either group the datasets by using cluster labels provided by
the original authors of the experimental study or by clustering the dataset using one of the
many clustering algorithms proposed for single cell data [160] (in case pre-defined cell-labels
are absent). Assuming independence within cell-groups, we could estimate cell-type specific
networks using scSGL for each cluster separately.
    In this section, we demonstrate the process for using scSGL to learn cell type specific
GRNs and apply this process to the differentiation dataset hESC. We apply scSGL seper-
                                                           144


                                T=0                                                      T = 12                                                    T = 24
                                 GATA4                                                     GATA4                                                    GATA4
                   GATA6                  GATA3                         GATA6                       GATA3                         GATA6                      GATA3
                       G11               GAT        4                       G11                    GAT        4                       G11                   GAT        4
                    GN                      A2                           GN                           A2                           GN                          A2
                            C                     BB                             C                          BB                             C                         BB
              HA         GS                   ER         S        HA          GS                        ER         S        HA          GS                       ER         S
                ND
                   1                                O ME            ND
                                                                          1                                   O ME            ND
                                                                                                                                    1                                  O ME
                                                  E                                                         E                                                        E
           HAPLN                                           3B   HAPLN                                                3B   HAPLN                                               3B
                 1                                  DN TM             1                                       DN TM             1                                      DN TM
             IFI16                                  CER1          IFI16                                       CER1          IFI16                                      CER1
                                                   ZFP42                                                     ZFP42                                                    ZFP42
            LECT1                                                LECT1                                                     LECT1
                    1                             SO                    1                                   SO                    1                                  SO
                LHX 0                                X2             LHX 0                                      X2             LHX 0                                     X2
                    P K1           X1 7
                                 SO M14                                 PK1                  X1 7
                                                                                           SO M14                                 PK1                 X1 7
                                                                                                                                                    SO M14
                       MY                                                       MY                                                        MY
                 MA       CT1      PR   D                            MA            CT1       PR   D                            MA            CT1      PR   D
                     NANO            POU5 F1                                  NANO             POU5 F1                                  NANO            POU5 F1
                          G           PMAIP1                                       G            PMAIP1                                       G           PMAIP1
                                T = 36                                                   T = 72                                                    T = 96
                           GATA4
                                                                                           GATA4
                                          GATA3                                                     GATA3                                    GATA4           GATA3
                      GN
                            GATA6        GAT        4
                                                                              GN   GATA6           GAT           4
                                                                                                                                        GN
                                                                                                                                              GATA6         GAT        4
                   G11                      A2
                                                  BB                    G11                           A2
                                                                                                            BB                    G11                          A2
                                                                                                                                                                     BB
                        C                                                    C                                                         C
              HA     GS                       ER       S          HA      GS                            ER           S      HA      GS                           ER        S
                ND                                   ME             ND                                        OM
                                                                                                                 E            ND                                        ME
                   1                               EO                     1                                  E                      1                                EO
           HAPLN                                         3B     HAPLN                                              3B     HAPLN                                              3B
                 1                                  DNMT              1                                       DNMT              1                                      DNMT
             IFI16                                  CER1          IFI16                                       CER1          IFI16                                      CER1
                                                   ZFP42                                                     ZFP42                                                    ZFP42
            LECT1                                                LECT1                                                     LECT1
                    1                             SO                    1                                   SO                    1                                  SO
                LHX 0                                X2             LHX 0                                      X2             LHX 0                                     X2
                       1              7                                    1                    7                                    1                   7
                    PK MY
                                   X1
                                 SO M14                                 PK      MY
                                                                                             X1
                                                                                           SO M14                                 PK      MY
                                                                                                                                                      X1
                                                                                                                                                    SO M14
                 MA      CT1       PR   D                            MA           CT1        PR   D                            MA           CT1       PR   D
                     NANO            POU5 F1                                  NANO             POU5 F1                                  NANO            POU5 F1
                          G           PMAIP1                                       G            PMAIP1                                       G           PMAIP1
Figure C.5 Edges detected using scSGL-r between 24 Lineage marker genes of hESC at
different time points of the differentiation process. Only edges whose absolute edge weights
fall into top 10 percent are shown. Edge thicknesses are proportional to their weights, and
node sizes are proportional to their degrees.
ately to the hESC dataset clustered by days (0,12,24,36,72 and 96 hrs) and learn scSGL
graphs between 24 lineage-specific marker genes [143] at these different time points. Figure
C.5 demonstrates the absence of edges from the Gata-family binding proteins at 0h. Gata
family binding proteins have been reported as necessary for the development and function
of a number of endoderm-derived tissues and cells [146, 161]. Onset of Gata4 and Gata6 ex-
pression has been reported to be coincident with the beginning of endoderm gene expression
hence the absence of edges from Gata 4 and Gata6 at 0h are indicative of the undifferentiated
nature of the single cells [146]. Weak interactions start to emerge at 12 hours of differentia-
tion with inhibition of pluripotency markers Nanog and Sox2. At 24 h of differentiation, we
notice a stronger inhibition of Nanog and pluripotency marker Pmaip1 by Gata6, indicating
a transition of the cells towards a primitive streak state. Hand1 has been reported to play
an essential role in both trophoblast-giant cells differentiation and in cardiac morphogenesis
[162]. Inhibition of known DE marker Cer1 by Hand1 and Gata family TF’s at 36 and 72
                                                                                      145


h of differentiation indicates an advanced state of differentiation. The appearance of key
DE markers Gata2, Gata4, Gata6, Cer1 and Eomes as hub-nodes in 96-h time point net-
work indicates that the cells have progressed toward the definitive endoderm (DE) state.
This analysis clearly demonstrates that scSGL identifies gene network changes from data
clustered over time points.
    We acknowledge that analyzing the dataset in this manner does not exploit the similarity
between the true cell-type specific networks and estimating a single network for the different
cell-types ignores the fact that we do not expect the cell-type specific graphs to be identical.
Our optimization framework can be extended to jointly learn Laplacians estimated from
multiple cell groups but that is out of scope for the current paper and will be considered in
future research.
                                             146


BIBLIOGRAPHY
      147


                                     BIBLIOGRAPHY
[1]  Rance Nault, Satabdi Saha, Sudin Bhattacharya, Jack Dodson, Samiran Sinha,
     Tapabrata Maiti, and Tim Zacharewski. Benchmarking of a bayesian single cell rnaseq
     differential gene expression test for dose–response study designs. Nucleic acids research,
     50(8):e48–e48, 2022.
[2]  Abdullah Karaaslanli, Satabdi Saha, Selin Aviyente, and Tapabrata Maiti. scsgl: ker-
     nelized signed graph learning for single-cell gene regulatory network inference. Bioin-
     formatics, 38(11):3011–3019, 2022.
[3]  Aditya Pratapa, Amogh P Jalihal, Jeffrey N Law, Aditya Bharadwaj, and TM Mu-
     rali. Benchmarking algorithms for gene regulatory network inference from single-cell
     transcriptomic data. Nature methods, 17(2):147–154, 2020.
[4]  L. Zappia, B. Phipson, and A. Oshlack. Splatter: simulation of single-cell rna sequenc-
     ing data. Genome Biol, 18(1):174, 2017.
[5]  Ehud Shapiro, Tamir Biezuner, and Sten Linnarsson. Single-cell sequencing-based
     technologies will revolutionize whole-organism science. Nature Reviews Genetics,
     14(9):618–630, 2013.
[6]  Cole Trapnell. Defining cell types and states with single-cell genomics. Genome re-
     search, 25(10):1491–1498, 2015.
[7]  Aleksandra A Kolodziejczyk, Jong Kyoung Kim, Valentine Svensson, John C Marioni,
     and Sarah A Teichmann. The technology and biology of single-cell rna sequencing.
     Molecular cell, 58(4):610–620, 2015.
[8]  Oliver Stegle, Sarah A Teichmann, and John C Marioni. Computational and analytical
     challenges in single-cell transcriptomics. Nature Reviews Genetics, 16(3):133–145, 2015.
[9]  Charles Gawad, Winston Koh, and Stephen R Quake. Single-cell genome sequencing:
     current state of the science. Nature Reviews Genetics, 17(3):175–188, 2016.
[10] Peter V Kharchenko, Lev Silberstein, and David T Scadden. Bayesian approach to
     single-cell differential expression analysis. Nature methods, 11(7):740–742, 2014.
[11] Davide Risso, Fanny Perraudeau, Svetlana Gribkova, Sandrine Dudoit, and Jean-
     Philippe Vert. Zinb-wave: A general and flexible method for signal extraction from
     single-cell rna-seq data. BioRxiv, page 125112, 2017.
[12] Kwangbom Choi, Yang Chen, Daniel A Skelly, and Gary A Churchill. Bayesian
     model selection reveals biological origins of zero inflation in single-cell transcriptomics.
     Genome biology, 21(1):1–16, 2020.
                                              148


[13] Emma Pierson and Christopher Yau. Zifa: Dimensionality reduction for zero-inflated
     single-cell gene expression analysis. Genome biology, 16(1):1–10, 2015.
[14] Andrew McDavid, Greg Finak, Pratip K Chattopadyay, Maria Dominguez, Laurie
     Lamoreaux, Steven S Ma, Mario Roederer, and Raphael Gottardo. Data exploration,
     quality control and testing in single-cell qpcr-based gene expression experiments. Bioin-
     formatics, 29(4):461–467, 2013.
[15] Abhishek Sarkar and Matthew Stephens. Separating measurement and expression mod-
     els clarifies confusion in single-cell rna sequencing analysis. Nature genetics, 53(6):770–
     777, 2021.
[16] Greg Finak, Andrew McDavid, Masanao Yajima, Jingyuan Deng, Vivian Gersuk,
     Alex K Shalek, Chloe K Slichter, Hannah W Miller, M Juliana McElrath, Martin
     Prlic, et al. Mast: a flexible statistical framework for assessing transcriptional changes
     and characterizing heterogeneity in single-cell rna sequencing data. Genome biology,
     16(1):1–13, 2015.
[17] Andrew McDavid, Raphael Gottardo, Noah Simon, and Mathias Drton. Graphical
     models for zero-inflated single cell gene expression. The annals of applied statistics,
     13(2):848, 2019.
[18] Mark D Robinson, Davis J McCarthy, and Gordon K Smyth. edger: a bioconductor
     package for differential expression analysis of digital gene expression data. bioinfor-
     matics, 26(1):139–140, 2010.
[19] Simon Anders and Wolfgang Huber. Differential expression analysis for sequence count
     data. Nature Precedings, pages 1–1, 2010.
[20] Charity W Law, Yunshun Chen, Wei Shi, and Gordon K Smyth. voom: Precision
     weights unlock linear model analysis tools for rna-seq read counts. Genome biology,
     15(2):1–17, 2014.
[21] Valentine Svensson. Droplet scrna-seq is not zero-inflated. Nature Biotechnology,
     38(2):147–150, 2020.
[22] Justin D Silverman, Kimberly Roche, Sayan Mukherjee, and Lawrence A David.
     Naught all zeros in sequence count data are the same. Computational and structural
     biotechnology journal, 18:2789, 2020.
[23] KS Crump, DG Hoel, CH Langley, and R Peto.                      Fundamental carcinogenic
     processes and their implications for low dose risk assessment. Cancer research,
     36(9_Part_1):2973–2979, 1976.
[24] Food, Drug Administration, et al. Guidance for industry: exposure-response
     relationships-study design, data analysis, and regulatory applications. http://www.
     fda. gov/cber/gdlns/exposure. pdf, 2003.
                                                149


[25] L Martin. Benchmark dose software (bmds) version 2.1 user’s manual version 2.0.
     Washington, DC: United States Environmental Protection Agency, Office of Environ-
     mental Information, 2009.
[26] National Toxicology Program et al. Ntp research report on national toxicology pro-
     gram approach to genomic dose-response modeling: Research report 5 [internet]. NTP
     Research Report on National Toxicology Program, 2018.
[27] Qiang Zhang, W Michael Caudle, Jingbo Pi, Sudin Bhattacharya, Melvin E Andersen,
     Norbert E Kaminski, and Rory B Conolly. Embracing systems toxicology at single-cell
     resolution. Current opinion in toxicology, 16:49–57, 2019.
[28] J Allen Davis, Jeffrey S Gift, and Q Jay Zhao. Introduction to benchmark dose methods
     and us epa’s benchmark dose software (bmds) version 2.1. 1. Toxicology and applied
     pharmacology, 254(2):181–191, 2011.
[29] Beate Vieth, Swati Parekh, Christoph Ziegenhain, Wolfgang Enard, and Ines Hell-
     mann. A systematic evaluation of single cell rna-seq analysis pipelines. Nature com-
     munications, 10(1):1–11, 2019.
[30] Zhun Miao, Ke Deng, Xiaowo Wang, and Xuegong Zhang. Desingle for detecting three
     types of differential expression in single-cell rna-seq data. Bioinformatics, 34(18):3223–
     3224, 2018.
[31] Keegan D Korthauer, Li-Fang Chu, Michael A Newton, Yuan Li, James Thomson, Ron
     Stewart, and Christina Kendziorski. A statistical approach for identifying differential
     distributions in single-cell rna-seq experiments. Genome biology, 17(1):1–15, 2016.
[32] Charlotte Soneson and Mark D Robinson. Bias, robustness and scalability in single-cell
     differential expression analysis. Nature methods, 15(4):255–261, 2018.
[33] Tian Mou, Wenjiang Deng, Fengyun Gu, Yudi Pawitan, and Trung Nghia Vu. Re-
     producibility of methods to detect differentially expressed genes from single-cell rna
     sequencing. Frontiers in genetics, 10:1331, 2020.
[34] Maria K Jaakkola, Fatemeh Seyednasrollah, Arfa Mehmood, and Laura L Elo. Compar-
     ison of methods to detect differentially expressed genes between single-cell populations.
     Briefings in bioinformatics, 18(5):735–743, 2017.
[35] Tae Kyun Kim. Understanding one-way anova using conceptual figures. Korean journal
     of anesthesiology, 70(1):22–26, 2017.
[36] DA Williams. A test for differences between treatment means when several dose levels
     are compared with a zero dose control. Biometrics, pages 103–117, 1971.
[37] Jan De Leeuw, Kurt Hornik, and Patrick Mair. Isotone optimization in r: pool-
     adjacent-violators algorithm (pava) and active set methods. Journal of statistical soft-
     ware, 32:1–24, 2010.
                                               150


[38] Tim Holland-Letz and Annette Kopp-Schneider. Optimal experimental designs for
     dose–response studies with continuous endpoints. Archives of toxicology, 89(11):2059–
     2068, 2015.
[39] Marc Aerts, Matthew W Wheeler, and José Cortiñas Abrahantes. An extended and
     unified modeling framework for benchmark dose estimation for both continuous and
     binary data. Environmetrics, 31(7):e2630, 2020.
[40] Matthew W Wheeler, Jose Cortiñas Abrahantes, Marc Aerts, Jeffery S Gift, and Jerry
     Allen Davis. Continuous model averaging for benchmark dose analysis: Averaging over
     distributional forms. Environmetrics, page e2728, 2022.
[41] Richard L Schmoyer. Sigmoidally constrained maximum likelihood estimation in quan-
     tal bioassay. Journal of the American Statistical Association, 79(386):448–453, 1984.
[42] Colleen Kelly and John Rice. Monotone smoothing with application to dose-response
     curves and the assessment of synergism. Biometrics, pages 1071–1085, 1990.
[43] Michel Delecroix, Michel Simioni, and Christine Thomas-Agnan. Functional estimation
     under shape constraints. Journaltitle of Nonparametric Statistics, 6(1):69–89, 1996.
[44] Brian Neelon and David B Dunson. Bayesian isotonic regression and trend analysis.
     Biometrics, 60(2):398–406, 2004.
[45] Björn Bornkamp and Katja Ickstadt. Bayesian nonparametric estimation of contin-
     uous monotone functions with applications to dose–response analysis. Biometrics,
     65(1):198–205, 2009.
[46] Lizhen Lin and David B Dunson. Bayesian monotone regression using gaussian process
     projection. Biometrika, 101(2):303–317, 2014.
[47] Daniel Marbach, James C Costello, Robert Küffner, Nicole M Vega, Robert J Prill,
     Diogo M Camacho, Kyle R Allison, Manolis Kellis, James J Collins, and Gustavo
     Stolovitzky. Wisdom of crowds for robust gene network inference. Nature methods,
     9(8):796–804, 2012.
[48] Lian En Chai, Swee Kuan Loh, Swee Thing Low, Mohd Saberi Mohamad, Safaai Deris,
     and Zalmiyah Zakaria. A review on the computational approaches for gene regulatory
     network construction. Computers in biology and medicine, 48:55–65, 2014.
[49] Peter Langfelder and Steve Horvath. Wgcna: an r package for weighted correlation
     network analysis. BMC bioinformatics, 9(1):1–13, 2008.
[50] Seongho Kim. ppcor: an r package for a fast calculation to semi-partial correlation
     coefficients. Communications for statistical applications and methods, 22(6):665, 2015.
[51] Nir Friedman, Michal Linial, Iftach Nachman, and Dana Pe’er. Using bayesian net-
     works to analyze expression data. Journal of computational biology, 7(3-4):601–620,
     2000.
                                           151


[52] Vân Anh Huynh-Thu, Alexandre Irrthum, Louis Wehenkel, and Pierre Geurts. Infer-
     ring regulatory networks from expression data using tree-based methods. PloS one,
     5(9):1–10, 2010.
[53] Thomas Moerman, Sara Aibar Santos, Carmen Bravo González-Blas, Jaak Simm, Yves
     Moreau, Jan Aerts, and Stein Aerts. Grnboost2 and arboreto: efficient and scalable
     inference of gene regulatory networks. Bioinformatics, 35(12):2159–2161, 2019.
[54] Adam A Margolin, Ilya Nemenman, Katia Basso, Chris Wiggins, Gustavo Stolovitzky,
     Riccardo Dalla Favera, and Andrea Califano. Aracne: an algorithm for the reconstruc-
     tion of gene regulatory networks in a mammalian cellular context. In BMC bioinfor-
     matics, volume 7, pages 1–15. Springer, 2006.
[55] Jeremiah J Faith, Boris Hayete, Joshua T Thaden, Ilaria Mogno, Jamey Wierzbowski,
     Guillaume Cottarel, Simon Kasif, James J Collins, and Timothy S Gardner. Large-
     scale mapping and validation of escherichia coli transcriptional regulation from a com-
     pendium of expression profiles. PLoS biol, 5(1):e8, 2007.
[56] Kevin Murphy, Saira Mian, et al. Modelling gene expression data using dynamic
     bayesian networks. Technical report, Citeseer, 1999.
[57] Jiguo Cao, Xin Qi, and Hongyu Zhao. Modeling gene regulation networks using or-
     dinary differential equations. In Next generation microarray bioinformatics, pages
     185–197. Springer, 2012.
[58] Pierre Geurts et al. dyngenie3: dynamical genie3 for the inference of gene networks
     from time series expression data. Scientific reports, 8(1):1–12, 2018.
[59] Ziv Bar-Joseph, Georg K Gerber, Tong Ihn Lee, Nicola J Rinaldi, Jane Y Yoo, François
     Robert, D Benjamin Gordon, Ernest Fraenkel, Tommi S Jaakkola, Richard A Young,
     et al. Computational discovery of gene modules and regulatory networks. Nature
     biotechnology, 21(11):1337–1342, 2003.
[60] Keren Bahar Halpern, Rom Shenhav, Orit Matcovitch-Natan, Beata Toth, Doron
     Lemze, Matan Golan, Efi E Massasa, Shaked Baydatch, Shanie Landen, Andreas E
     Moor, et al. Single-cell spatial reconstruction reveals global division of labour in the
     mammalian liver. Nature, 542(7641):352–356, 2017.
[61] Tianhao Mu, Liqin Xu, Yu Zhong, Xinyu Liu, Zhikun Zhao, Chaoben Huang, Xiaofeng
     Lan, Chengchen Lufei, Yi Zhou, Yixun Su, et al. Embryonic liver developmental tra-
     jectory revealed by single-cell rna sequencing in the foxa2egfp mouse. Communications
     biology, 3(1):1–12, 2020.
[62] Dongyin Guan, Ying Xiong, Trang Minh Trinh, Yang Xiao, Wenxiang Hu, Chunjie
     Jiang, Pieterjan Dierickx, Cholsoon Jang, Joshua D Rabinowitz, and Mitchell A Lazar.
     The hepatocyte clock and feeding control chronophysiology of multiple liver cell types.
     Science, 369(6509):1388–1394, 2020.
                                             152


[63] Xuelian Xiong, Henry Kuang, Sahar Ansari, Tongyu Liu, Jianke Gong, Shuai Wang,
     Xu-Yun Zhao, Yewei Ji, Chuan Li, Liang Guo, et al. Landscape of intercellular crosstalk
     in healthy and nash liver revealed by single-cell secretome gene analysis. Molecular
     cell, 75(3):644–660, 2019.
[64] Reza Farmahin, Anne Marie Gannon, Rémi Gagné, Andrea Rowan-Carroll, Byron
     Kuo, Andrew Williams, Ivan Curran, and Carole L Yauk. Hepatic transcriptional dose-
     response analysis of male and female fischer rats exposed to hexabromocyclododecane.
     Food and Chemical Toxicology, 133:110262, 2019.
[65] Ivy Moffat, Nikolai L Chepelev, Sarah Labib, Julie Bourdon-Lacombe, Byron Kuo,
     Julie K Buick, France Lemieux, Andrew Williams, Sabina Halappanavar, Amal I Malik,
     et al. Comparison of toxicogenomics and traditional approaches to inform mode of
     action and points of departure in human health risk assessment of benzo [a] pyrene in
     drinking water. Critical reviews in toxicology, 45(1):1–43, 2015.
[66] A Francina Webster, Nikolai Chepelev, Rémi Gagné, Byron Kuo, Leslie Recio, Andrew
     Williams, and Carole L Yauk. Impact of genomics platform and statistical filtering
     on transcriptional benchmark doses (bmd) and multiple approaches for selection of
     chemical point of departure (pod). PLoS One, 10(8):e0136764, 2015.
[67] Timothy W Gant and Shu-Dong Zhang. In pursuit of effective toxicogenomics. Muta-
     tion Research/Fundamental and Molecular Mechanisms of Mutagenesis, 575(1-2):4–16,
     2005.
[68] Samarendra Das and Shesh N Rai. Swarnseq: An improved statistical approach for
     differential expression analysis of single-cell rna-seq data. Genomics, 113(3):1308–1324,
     2021.
[69] Minjeong Jeon and Paul De Boeck. Decision qualities of bayes factor and p value-based
     hypothesis testing. Psychological Methods, 22(2):340, 2017.
[70] Yong Li, Xiao-Bin Liu, and Jun Yu. A bayesian chi-squared test for hypothesis testing.
     Journal of Econometrics, 189(1):54–69, 2015.
[71] Kelly A Fader, Rance Nault, Mathew P Kirby, Gena Markous, Jason Matthews, and
     Timothy R Zacharewski. Convergence of hepcidin deficiency, systemic iron overloading,
     heme accumulation, and rev-erbα/β activation in aryl hydrocarbon receptor-elicited
     hepatotoxicity. Toxicology and applied pharmacology, 321:1–17, 2017.
[72] Nathalie Percie du Sert, Viki Hurst, Amrita Ahluwalia, Sabina Alam, Marc T Avey,
     Monya Baker, William J Browne, Alejandra Clark, Innes C Cuthill, Ulrich Dirnagl,
     et al. The arrive guidelines 2.0: Updated guidelines for reporting animal research.
     Journal of Cerebral Blood Flow & Metabolism, 40(9):1769–1777, 2020.
[73] Rance Nault, Kelly A Fader, Sudin Bhattacharya, and Tim R Zacharewski. Single-
     nuclei rna sequencing assessment of the hepatic effects of 2, 3, 7, 8-tetrachlorodibenzo-
     p-dioxin. Cellular and Molecular Gastroenterology and Hepatology, 11(1):147–159,
     2021.
                                               153


[74] Andrew Butler, Paul Hoffman, Peter Smibert, Efthymia Papalexi, and Rahul Satija.
     Integrating single-cell transcriptomic data across different conditions, technologies, and
     species. Nature biotechnology, 36(5):411–420, 2018.
[75] Luke Zappia, Belinda Phipson, and Alicia Oshlack. Splatter: simulation of single-cell
     rna sequencing data. Genome biology, 18(1):1–15, 2017.
[76] Michael A Newton, Amine Noueiry, Deepayan Sarkar, and Paul Ahlquist. Detect-
     ing differential gene expression with a semiparametric hierarchical mixture method.
     Biostatistics, 5(2):155–176, 2004.
[77] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical
     and powerful approach to multiple testing. Journal of the Royal statistical society:
     series B (Methodological), 57(1):289–300, 1995.
[78] Gordon K Smyth. Linear models and empirical bayes methods for assessing differential
     expression in microarray experiments. Statistical applications in genetics and molecular
     biology, 3(1), 2004.
[79] Frank Wilcoxon. Individual comparisons by ranking methods. In Breakthroughs in
     statistics, pages 196–202. Springer, 1992.
[80] Ronald Aylmer Fisher et al. On the" probable error" of a coefficient of correlation
     deduced from a small sample.(1921). Contributions to Mathematical Statistics. 3–32,
     1950.
[81] William H Kruskal and W Allen Wallis. Use of ranks in one-criterion variance analysis.
     Journal of the American statistical Association, 47(260):583–621, 1952.
[82] Rafael A Irizarry, Daniel Warren, Forrest Spencer, Irene F Kim, Shyam Biswal,
     Bryan C Frank, Edward Gabrielson, Joe GN Garcia, Joel Geoghegan, Gregory Ger-
     mino, et al. Multiple-laboratory comparison of microarray platforms. Nature methods,
     2(5):345–350, 2005.
[83] Beate Vieth, Christoph Ziegenhain, Swati Parekh, Wolfgang Enard, and Ines Hell-
     mann. powsimr: power analysis for bulk and single cell rna-seq experiments. Bioin-
     formatics, 33(21):3486–3488, 2017.
[84] Xiuwei Zhang, Chenling Xu, and Nir Yosef. Simulating multiple faceted variability in
     single cell rna sequencing. Nature communications, 10(1):1–16, 2019.
[85] Alemu Takele Assefa, Jo Vandesompele, and Olivier Thas. Spsimseq: semi-parametric
     simulation of bulk and single-cell rna-sequencing data. Bioinformatics, 36(10):3276–
     3278, 2020.
[86] Davide Chicco and Giuseppe Jurman. The advantages of the matthews correlation
     coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC
     genomics, 21(1):1–13, 2020.
                                             154


[87] Alessandra Dal Molin, Giacomo Baruzzo, and Barbara Di Camillo. Single-cell rna-
      sequencing: assessment of differential expression analysis methods. Frontiers in genet-
      ics, 8:62, 2017.
[88] Jason R Phillips, Daniel L Svoboda, Arpit Tandon, Shyam Patel, Alex Sedykh, Deepak
      Mav, Byron Kuo, Carole L Yauk, Longlong Yang, Russell S Thomas, et al. Bmdex-
      press 2: enhanced transcriptomic dose-response analysis workflow. Bioinformatics,
      35(10):1780–1782, 2019.
[89] Othman Soufan, Jessica Ewald, Charles Viau, Doug Crump, Markus Hecker, Niladri
      Basu, and Jianguo Xia. T1000: a reduced gene set prioritized for toxicogenomic studies.
      PeerJ, 7:e7975, 2019.
[90] David R Hunter and Kenneth Lange. A tutorial on mm algorithms. The American
      Statistician, 58(1):30–37, 2004.
[91] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from
      incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series
      B (Methodological), 39(1):1–22, 1977.
[92] Kenneth Lange. MM optimization algorithms. SIAM, 2016.
[93] D. R. Hunter and K. Lange. A tutorial on mm algorithms. The American Statistician,
      58:30––37, 2004.
[94] S Lang and A Brezger. Bayesian P-splines. Journal of Computational and Graphical
      Statistics, 13:183–212, 2004.
[95] Kenneth Lange. The mm algorithm. In Optimization, pages 185–219. Springer, 2013.
[96] Florin Vaida. Parameter convergence for em and mm algorithms. Statistica Sinica,
      pages 831–840, 2005.
[97] Jan de Leeuw and Kenneth Lange. Sharp quadratic majorization in one dimension.
      Computational statistics & data analysis, 53(7):2471–2484, 2009.
[98] Kenneth Lange, Joong-Ho Won, Alfonso Landeros, and Hua Zhou. Nonconvex opti-
      mization via mm algorithms: Convergence theory. arXiv preprint arXiv:2106.02805,
      2021.
[99] Aaron. T. L. Lun, Karsten Bach, and John C Marioni. Pooling across cells to normalize
      single-cell rna sequencing data with many zero counts. Genome biology, 17(1):1–14,
      2016.
[100] Matt P Wand. A comparison of regression spline smoothing procedures. Computational
      Statistics, 15(4):443–462, 2000.
[101] David Ruppert, Matt P Wand, and Raymond J Carroll. Semiparametric regression.
      Cambridge university press, 2003.
                                             155


[102] T Robertson, FT Wright, and R Dykstra. Order Restricted Statistical Inference. John
      Wiley&Sons, 1988.
[103] Suzanne Winsberg and James O Ramsay. Monotonic transformations to additivity
      using splines. Biometrika, 67(3):669–674, 1980.
[104] Larry Schumaker. Spline functions: basic theory. Cambridge University Press, 2007.
[105] Simon Wood and Maintainer Simon Wood. Package ‘mgcv’. R package version,
      1(29):729, 2015.
[106] S Wotherspoon and Burch P.            Package ‘zigam’.      R package Github version,
      https://github.com/AustralianAntarcticDataCentre/zigam, 2016.
[107] Naomi Moris, Cristina Pina, and Alfonso Martinez Arias. Transition states and cell
      fate decisions in epigenetic landscapes. Nature Reviews Genetics, 17(11):693–703, 2016.
[108] Mark WEJ Fiers, Liesbeth Minnoye, Sara Aibar, Carmen Bravo González-Blas, Zeynep
      Kalender Atak, and Stein Aerts. Mapping gene regulatory networks from single-cell
      omics data. Briefings in functional genomics, 17(4):246–254, 2018.
[109] Assieh Saadatpour, Guoji Guo, Stuart H Orkin, and Guo-Cheng Yuan. Characterizing
      heterogeneity in leukemic cells using single-cell gene expression analysis. Genome
      biology, 15(12):1–13, 2014.
[110] Victoria Moignard, Steven Woodhouse, Laleh Haghverdi, Andrew J Lilly, Yosuke
      Tanaka, Adam C Wilkinson, Florian Buettner, Iain C Macaulay, Wajid Jawaid, Evan-
      gelia Diamanti, et al. Decoding the regulatory network of early blood development
      from single-cell gene expression measurements. Nature biotechnology, 33(3):269–276,
      2015.
[111] Shuonan Chen and Jessica C Mar. Evaluating methods of inferring gene regulatory
      networks highlights their lack of performance for single cell gene expression data. BMC
      bioinformatics, 19(1):1–21, 2018.
[112] Lucrezia Patruno, Davide Maspero, Francesco Craighero, Fabrizio Angaroni, Marco
      Antoniotti, and Alex Graudenzi. A review of computational strategies for denois-
      ing and imputation of single-cell transcriptomic data. Briefings in Bioinformatics,
      22(4):bbaa222, 2021.
[113] Kyle Akers and TM Murali. Gene regulatory network inference in single-cell biology.
      Current Opinion in Systems Biology, 26:87–97, 2021.
[114] Davide Risso, Fanny Perraudeau, Svetlana Gribkova, Sandrine Dudoit, and Jean-
      Philippe Vert. A general and flexible method for signal extraction from single-cell
      rna-seq data. Nature communications, 9(1):1–17, 2018.
[115] Antonio Ortega, Pascal Frossard, Jelena Kovačević, José MF Moura, and Pierre Van-
      dergheynst. Graph signal processing: Overview, challenges, and applications. Proceed-
      ings of the IEEE, 106(5):808–828, 2018.
                                              156


[116] Xiaowen Dong, Dorina Thanou, Michael Rabbat, and Pascal Frossard. Learning graphs
      from data: A signal representation perspective. IEEE Signal Processing Magazine,
      36(3):44–63, 2019.
[117] Gonzalo Mateos, Santiago Segarra, Antonio G Marques, and Alejandro Ribeiro. Con-
      necting the dots: Identifying network structure via graph signal processing. IEEE
      Signal Processing Magazine, 36(3):16–43, 2019.
[118] Xiaowen Dong, Dorina Thanou, Pascal Frossard, and Pierre Vandergheynst. Learning
      laplacian matrix in smooth graph signal representations. IEEE Transactions on Signal
      Processing, 64(23):6160–6173, 2016.
[119] Vassilis Kalofolias. How to learn a graph from smooth signals. In Artificial Intelligence
      and Statistics, pages 920–929, 2016.
[120] Junhui Hou, Lap-Pui Chau, Ying He, and Huanqiang Zeng. Robust laplacian matrix
      learning for smooth graph signals. In 2016 IEEE International Conference on Image
      Processing (ICIP), pages 1878–1882. IEEE, 2016.
[121] Peter Berger, Gabor Hannak, and Gerald Matz. Efficient graph learning from noisy
      and incomplete data. IEEE Transactions on Signal and Information Processing over
      Networks, 6:105–119, 2020.
[122] Sai Kiran Kadambari and Sundeep Prabhakar Chepuri. Learning product graphs from
      multidomain signals. In ICASSP 2020-2020 IEEE International Conference on Acous-
      tics, Speech and Signal Processing (ICASSP), pages 5665–5669. IEEE, 2020.
[123] Liu Rui, Hossein Nejati, Seyed Hamid Safavi, and Ngai-Man Cheung. Simultaneous
      low-rank component and graph estimation for high-dimensional graph signals: Appli-
      cation to brain imaging. In 2017 IEEE International Conference on Acoustics, Speech
      and Signal Processing (ICASSP), pages 4134–4138. IEEE, 2017.
[124] Gerald Matz and Thomas Dittrich. Learning signed graphs from data. In ICASSP
      2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing
      (ICASSP), pages 5570–5574. IEEE, 2020.
[125] Jérôme Kunegis, Stephan Schmidt, Andreas Lommatzsch, Jürgen Lerner, Ernesto W
      De Luca, and Sahin Albayrak. Spectral analysis of signed graphs for clustering, pre-
      diction and visualization. In Proceedings of the 2010 SIAM International Conference
      on Data Mining, pages 559–570. SIAM, 2010.
[126] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Van-
      dergheynst. The emerging field of signal processing on graphs: Extending high-
      dimensional data analysis to networks and other irregular domains. IEEE signal pro-
      cessing magazine, 30(3):83–98, 2013.
[127] Aliaksei Sandryhaila and Jose MF Moura. Discrete signal processing on graphs: Fre-
      quency analysis. IEEE Transactions on Signal Processing, 62(12):3042–3054, 2014.
                                             157


[128] Thomas Hofmann, Bernhard Schölkopf, and Alexander J Smola. Kernel methods in
      machine learning. The annals of statistics, pages 1171–1220, 2008.
[129] John Shawe-Taylor, Nello Cristianini, et al. Kernel methods for pattern analysis. Cam-
      bridge university press, 2004.
[130] Michael A Skinnider, Jordan W Squair, and Leonard J Foster. Evaluating measures
      of association for single-cell transcriptomics. Nature methods, 16(5):381–386, 2019.
[131] Thomas P Quinn, Mark F Richardson, David Lovell, and Tamsyn M Crowley. propr:
      an r-package for identifying proportionally abundant features using compositional data
      analysis. Scientific reports, 7(1):1–9, 2017.
[132] Ronald S Pimentel, Magdalena Niewiadomska-Bugaj, and Jung-Chao Wang. Associ-
      ation of zero-inflated continuous variables. Statistics & Probability Letters, 96:61–67,
      2015.
[133] Inbal Yahav and Galit Shmueli. On generating multivariate poisson data in man-
      agement science applications. Applied Stochastic Models in Business and Industry,
      28(1):91–102, 2012.
[134] Thalia E Chan, Michael PH Stumpf, and Ann C Babtie. Gene regulatory network
      inference from single-cell data using multivariate information measures. Cell systems,
      5(3):251–267, 2017.
[135] Thomas Schaffter, Daniel Marbach, and Dario Floreano. Genenetweaver: in silico
      benchmark generation and performance profiling of network inference methods. Bioin-
      formatics, 27(16):2263–2270, 2011.
[136] Daniel Marbach, Thomas Schaffter, Claudio Mattiussi, and Dario Floreano. Generat-
      ing realistic in silico gene networks for performance assessment of reverse engineering
      methods. Journal of computational biology, 16(2):229–239, 2009.
[137] Damian Szklarczyk, Annika L Gable, Katerina C Nastou, David Lyon, Rebecca Kirsch,
      Sampo Pyysalo, Nadezhda T Doncheva, Marc Legeay, Tao Fang, Peer Bork, et al.
      The string database in 2021: customizable protein–protein networks, and functional
      characterization of user-uploaded gene/measurement sets. Nucleic acids research,
      49(D1):D605–D612, 2021.
[138] ENCODE Project Consortium et al. An integrated encyclopedia of dna elements in
      the human genome. Nature, 489(7414):57, 2012.
[139] Zhi-Ping Liu, Canglin Wu, Hongyu Miao, and Hulin Wu. Regnetwork: an integrated
      database of transcriptional and post-transcriptional regulatory networks in human and
      mouse. Database, 2015, 2015.
[140] Luz Garcia-Alonso, Christian H Holland, Mahmoud M Ibrahim, Denes Turei, and Julio
      Saez-Rodriguez. Benchmark and integration of resources for the estimation of human
      transcription factor activities. Genome research, 29(8):1363–1375, 2019.
                                               158


[141] Heonjong Han, Jae-Won Cho, Sangyoung Lee, Ayoung Yun, Hyojin Kim, Dasom Bae,
      Sunmo Yang, Chan Yeong Kim, Muyoung Lee, Eunbeen Kim, et al. Trrust v2: an ex-
      panded reference database of human and mouse transcriptional regulatory interactions.
      Nucleic acids research, 46(D1):D380–D386, 2018.
[142] DA Brafman, C Phung, N Kumar, and K Willert. Regulation of endodermal differen-
      tiation of human embryonic stem cells through integrin-ecm interactions. Cell Death
      & Differentiation, 20(3):369–381, 2013.
[143] Li-Fang Chu, Ning Leng, Jue Zhang, Zhonggang Hou, Daniel Mamott, David T
      Vereide, Jeea Choi, Christina Kendziorski, Ron Stewart, and James A Thomson.
      Single-cell rna-seq reveals novel regulators of human embryonic stem cell differenti-
      ation to definitive endoderm. Genome biology, 17(1):1–20, 2016.
[144] Alistair J Watt, Roong Zhao, Jixuan Li, and Stephen A Duncan. Development of the
      mammalian liver and ventral pancreas is dependent on gata4. BMC developmental
      biology, 7(1):1–11, 2007.
[145] Emily M Walker, Cayla A Thompson, and Michele A Battle. Gata4 and gata6 regulate
      intestinal epithelial cytodifferentiation during development. Developmental biology,
      392(2):283–294, 2014.
[146] JB Fisher, K Pulakanti, S Rao, and SA Duncan. Gata6 is essential for endoderm
      formation from human pluripotent stem cells. Biology open, 6(7):1084–1095, 2017.
[147] Qing Zhou, Hiram Chipperfield, Douglas A Melton, and Wing Hung Wong. A gene reg-
      ulatory network in mouse embryonic stem cells. Proceedings of the National Academy
      of Sciences, 104(42):16438–16443, 2007.
[148] Wenjing Shi, Hui Wang, Guangjin Pan, Yijie Geng, Yunqian Guo, and Duanqing Pei.
      Regulation of the pluripotency marker rex-1 by nanog and sox2. Journal of biological
      chemistry, 281(33):23319–23325, 2006.
[149] Kathy K Niakan, Hongkai Ji, René Maehr, Steven A Vokes, Kit T Rodolfa, Richard I
      Sherwood, Mariko Yamaki, John T Dimos, Alice E Chen, Douglas A Melton, et al.
      Sox17 promotes differentiation in mouse embryonic stem cells by directly regulating
      extraembryonic gene expression and indirectly antagonizing self-renewal. Genes &
      development, 24(3):312–326, 2010.
[150] Alexander Lex, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, and Hanspeter
      Pfister. Upset: visualization of intersecting sets. IEEE transactions on visualization
      and computer graphics, 20(12):1983–1992, 2014.
[151] Dominic Grün, Lennart Kester, and Alexander Van Oudenaarden. Validation of noise
      models for single-cell transcriptomics. Nature methods, 11(6):637–640, 2014.
[152] Raphael Petegrosso, Zhuliu Li, and Rui Kuang. Machine learning and statistical
      methods for clustering single-cell rna-sequencing data. Briefings in bioinformatics,
      21(4):1209–1223, 2020.
                                              159


[153] Zhaoning Wang, Miao Cui, Akansha M Shah, Wei Tan, Ning Liu, Rhonda Bassel-
      Duby, and Eric N Olson. Cell-type-specific gene regulatory networks underlying murine
      neonatal heart regeneration at single-cell resolution. Cell reports, 33(10):108472, 2020.
[154] Zhigang Xue, Kevin Huang, Chaochao Cai, Lingbo Cai, Chun-yan Jiang, Yun Feng,
      Zhenshan Liu, Qiao Zeng, Liming Cheng, Yi E Sun, et al. Genetic programs in human
      and mouse early embryos revealed by single-cell rna sequencing. Nature, 500(7464):593–
      597, 2013.
[155] Sara Aibar, Carmen Bravo González-Blas, Thomas Moerman, Hana Imrichova, Gert
      Hulselmans, Florian Rambow, Jean-Christophe Marine, Pierre Geurts, Jan Aerts,
      Joost van den Oord, et al. Scenic: single-cell regulatory network inference and clus-
      tering. Nature methods, 14(11):1083–1086, 2017.
[156] Tim Stuart, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi,
      William M Mauck III, Yuhan Hao, Marlon Stoeckius, Peter Smibert, and Rahul Satija.
      Comprehensive integration of single-cell data. Cell, 177(7):1888–1902, 2019.
[157] Holger Scheel and Stefan Scholtes. Mathematical programs with complementarity con-
      straints: Stationarity, optimality, and sensitivity. Mathematics of Operations Research,
      25(1):1–22, 2000.
[158] Yu Wang, Wotao Yin, and Jinshan Zeng. Global convergence of admm in nonconvex
      nonsmooth optimization. Journal of Scientific Computing, 78(1):29–63, 2019.
[159] Vassilis Kalofolias and Nathanaël Perraudin. Large scale graph learning from smooth
      signals. arXiv preprint arXiv:1710.05654, 2017.
[160] Vladimir Yu Kiselev, Tallulah S Andrews, and Martin Hemberg. Challenges in unsu-
      pervised clustering of single-cell rna-seq data. Nature Reviews Genetics, 20(5):273–282,
      2019.
[161] I-Cheng Ho, Tzong-Shyuan Tai, Sung-Yun Pai, et al. Gata3 and the t-cell lineage:
      essential functions before and after t-helper-2-cell differentiation. Nature reviews im-
      munology, 9(2):125–135, 2009.
[162] Pual Riley, Lynn Anaon-Cartwight, and James C Cross. The hand1 bhlh transcrip-
      tion factor is essential for placentation and cardiac morphogenesis. Nature genetics,
      18(3):271–275, 1998.
                                               160