You are here
Search results
(1  13 of 13)
 Title
 Optimal control system design : the predictive sampling problem
 Creator
 Ahn, Uhi
 Date
 1978
 Collection
 Electronic Theses & Dissertations
 Title
 Large and small sample properties of maximum likelihood estimates for the hierarchical linear model
 Creator
 Bassiri, Dina
 Date
 1988
 Collection
 Electronic Theses & Dissertations
 Title
 Influences of ecological parameters on herpetofaunal assemblages in northern Minnesota, with an assessment of sampling methodologies
 Creator
 Yaunches, Gabrielle D. (Gabrielle Dianne)
 Date
 1998
 Collection
 Electronic Theses & Dissertations
 Title
 Computation of power in the nested random effects models
 Creator
 Liu, Xiaofeng
 Date
 1999
 Collection
 Electronic Theses & Dissertations
 Title
 Two applications of quantitative methods in education : sampling design effects in largescale data and causal inference of classsize effects
 Creator
 Shen, Ting (Graduate of Michigan State University)
 Date
 2018
 Collection
 Electronic Theses & Dissertations
 Description

"This dissertation is a collection of four papers in which the former two papers address the issues of external validity concerning incorporating complex sampling design in model analysis in largescale data and the latter two papers address issues of internal validity involving statistical methods that facilitate causal inference of class size effects. Chapter 1 addressed whether, when and how to apply complex sampling weights via empirical, simulation and software investigations in the...
Show more"This dissertation is a collection of four papers in which the former two papers address the issues of external validity concerning incorporating complex sampling design in model analysis in largescale data and the latter two papers address issues of internal validity involving statistical methods that facilitate causal inference of class size effects. Chapter 1 addressed whether, when and how to apply complex sampling weights via empirical, simulation and software investigations in the context of largescale educational data focusing on fixed effects. The empirical evidences reveal that unweighted estimates agree with the weighted cases and two scaling methods make no difference. The possible difference between weighted single versus multilevel model may lie in the scaling procedure in the latter. The simulation results indicate that relative bias of the estimates in the models of unweighted single level, unweighted multilevel, weighted single level and weighted multilevel varies across different variables, but unweighted multilevel has the smallest root mean square errors consistently while weighted single model has the largest values for levelone variables. The software finding indicates that STATA and Mplus are more flexible and capable especially for weighted multilevel models where scaling is required. Chapter 2 investigated how to account for informative design arising from unequal probability of selection in multilevel modeling with a focus of the multilevel pseudo maximum likelihood (MPML) and the sample distribution approach (SDA). The Monte Carlo simulation evaluated the performance of MPML considering sampling weights and scaling. The results indicate that unscaled estimates have substantial positive bias for estimating cluster and individuallevel variations, thus the scaling procedure is essential. The SDA is conducted using empirical data, and the results are similar to the unweighted case which seems that the sampling design is not that informative or SDA is not working well in practice. Chapter 3 examined the longterm and causal inferences of class size effects on reading and mathematics achievement as well as on noncognitive outcomes in early grades via applying individual fixed effects models and propensity scores methods on the data of ECLSK 2011. Results indicate that attending smaller class improves reading and math achievement. In general, evidence of class size effects on noncognitive outcomes is not significant. Considering potential measurement errors involved in noncognitive variables, evidence of class size effects on noncognitive domain is less reliable. Chapter 4 applied instrumental variables (IV) methods and regression discontinuity designs (RDD) on TIMSS data in 2003, 2007 and 2011 to investigate whether class size has effects on eighth grader's cognitive achievement and noncognitive outcomes in math and four science subjects across four European countries (i.e., Hungary, Lithuania, Romania and Slovenia). The results of the IV analyses indicate that in Romania smaller class size has significant positive effects on academic scores for math, physics, chemistry and earth science as well as for math enjoyment in 2003. In Lithuania, class size effects on noncognitive skills are not consistent between IV and RDD analyses in 2007. Overall, the small class size benefit on achievement scores is only observed in Romania in 2003 while evidence of classsize effects on noncognitive skills may lack of reliability."Pages iiiii.
Show less
 Title
 Sample and hold functions and nonminimumphase systems
 Creator
 Wang, Yingxu (Graduate of Michigan State University)
 Date
 2014
 Collection
 Electronic Theses & Dissertations
 Description

ABSTRACTSAMPLE AND HOLD FUNCTIONS AND NONMINIMUMPHASE SYSTEMSByYingxu WangWith the same initial condition, a continuoustime system and the corresponding DE system will have the same system responses at sampling times due to switched input and zeroorder hold input, respectively. A previous work had been done to show that using the Square Pulse Sample and Hold Function (SPSHF) as the switched input, the corresponding Discrete Equivalent (DE) system of a continuoustime nonminimum phase ...
Show moreABSTRACTSAMPLE AND HOLD FUNCTIONS AND NONMINIMUMPHASE SYSTEMSByYingxu WangWith the same initial condition, a continuoustime system and the corresponding DE system will have the same system responses at sampling times due to switched input and zeroorder hold input, respectively. A previous work had been done to show that using the Square Pulse Sample and Hold Function (SPSHF) as the switched input, the corresponding Discrete Equivalent (DE) system of a continuoustime nonminimum phase (NMP) system could have minimum phase (MP) behavior. In this thesis, the switched input in the definition of DE system was expended with another two sample and hold functions, Forward Triangle Sample and Hold Function (FTSHF) and Backward Triangle Sample and Hold Function (BTSHF). The DE systems and its Discrete Time (DT) systems equations were developed according to different sample and hold functions. A simulation case study was proceeded to indicate the feasible regions for selecting the sample and hold parameters for each switched input, which would lead the resulting DE systems MP. The results of simulation case study implied that it was possible to have an MPDT system with a smaller sampling period than discretizing the system directly using zeroorder hold (ZOH) method. In order to study the robustness of each sample and hold function, the qMarkov cover system identification with Pseudo Random Binary Signal (PRBS) was then studied for the purpose of doing controllerintheloop (CIL) simulation. The CIL simulation used the dSPACE autobox to simulate the system dynamics. A resistantcapacitor (RC) filter was used to simulate actuator dynamics using different capacitors. At last the characteristics of the three sample and hold functions were compared.
Show less
 Title
 Some new models for small area estimation
 Creator
 Ren, Hao
 Date
 2011
 Collection
 Electronic Theses & Dissertations
 Description

This dissertation includes some new models for small area estimation. There are four parts in total. The first part studied the selection of fixed effects covariates in linear mixed models. A modified bootstrap selection procedure for linear model from literature was extended to linear mixed effects models. Both theoretical work and simulations showed the effectiveness of this procedure for linear mixed effects models.In the second part, a new approach by shrinking both means and variances of...
Show moreThis dissertation includes some new models for small area estimation. There are four parts in total. The first part studied the selection of fixed effects covariates in linear mixed models. A modified bootstrap selection procedure for linear model from literature was extended to linear mixed effects models. Both theoretical work and simulations showed the effectiveness of this procedure for linear mixed effects models.In the second part, a new approach by shrinking both means and variances of small areas was introducted. This method modeled the small area means and variances in a unified framework. The smoothed variance estimators used information of direct point estimators and their sampling variances, and consequently, for the smoothed small area estimators. Conditional mean squared error of prediction was also studied in this part to evaluate the performance of predictors.The third part studied the confidence intervals of small area estimators introduced in the second part. The literature of small area estimation is dominated by point estimation and their standard errors. The standard normal or studentt confidence intervals do not produce accurate intervals. The confidence intervals produced in this part are from a decision theory perspective.The fourth part estimated the small areas means with clustering of the small areas. In the realistic application, the estimation may not be appropriate to "borrow strength" from all other small areas universally, if cluster effects exist between clusters of small areas. A model based on clustering was studied in this part, which included an additional cluster effect to the basic area level model. Since the partition of clusters was not known, a stochastic search procedure from literature was adapted first to find the clustering partition.
Show less
 Title
 Three essays in complex samples
 Creator
 Rahmani, Iraj
 Date
 2012
 Collection
 Electronic Theses & Dissertations
 Description

The samples used in econometric studies are not always sets of randomly drawn observations from the populations of interest. In many studies sampling has a complex design involving clustering and stratification. In stratification, the population is divided into subpopulations or strata based on exogenous or endogenous variables and then a random sample of unit observations or clusters is drawn from each stratum. Clusters are contiguous groups of units existing within a stratum. Reducing the...
Show moreThe samples used in econometric studies are not always sets of randomly drawn observations from the populations of interest. In many studies sampling has a complex design involving clustering and stratification. In stratification, the population is divided into subpopulations or strata based on exogenous or endogenous variables and then a random sample of unit observations or clusters is drawn from each stratum. Clusters are contiguous groups of units existing within a stratum. Reducing the cost of sampling or operational convenience might be reasons for applying stratification and clustering. On the other hand, particular interest in a small subpopulation may cause oversampling that justifies nonrandom sampling scheme.This dissertation consists of three essays addressing estimation and inference in cross section and panel data models with nonrandom samples. In general, ignoring sampling design could produce inconsistent estimators and also inconsistent estimators for their standard errors. In the first essay a multistage sampling design including standard stratification and clustering stages at first and variable probability sampling in the final stage is considered. The problem is studied under Mestimators framework. Under a set of regularity conditions the usual weighting estimators are consistent and have asymptotic normal distributions. In cases that stratifications in the first or the second or in the both stages are exogenous dropping the corresponding weights are allowed; we still have consistent estimators. The second essay contributes to the subject of nonrandom sampling by studyingefficiency in panel data models when data set comes from stratified samples. The goal in this chapter is to obtain more efficient estimators by considering correlation within panels in models with stratified structure. We do not try to find the efficiency bound in this kind of models. Our attempt is to increase efficiency in compare with pooled models that ignore correlations within panels.The paper takes into account correlation within each panel and in eachstratum under a GMM based framework. Theoretical development shows that byconsidering correlation within the panels in each stratum and adding themtogether with appropriate weights, finding more efficient estimators is possible. Likegeneralized estimating equations (GEE) we are able to consider the specificform for correlation for panels in each stratum. Monte Carlo results confirm that the new GMM estimators that is called weighted and unweighted GLS are more efficient than their competitors OLS and weighted OLS that simply overlook the correlation within the panels. Incase of endogenous stratification, weighted GLS and in case of exogenousstratification unweighted GLS is doing better than the rest. For a specificsample size, this efficiency gain depends on what form is chosen forcorrelation and how strong or weak it is. We applied results to study determinants of inequality in the U.S. and estimation results show that efficiency gain in compare with POLS or weighted POLS is substantial.The subject of the third essay is model selection problem. In complex samples involving stratification and clustering, the assumption that observations are distributed independently and identically is not held anymore and therefore the Vuong's (1989) model selection tests are not applicable directly. In order to generalize Vuong's results to estimators other than MLE, we study the problem under M estimator framework that contains many estimators including but not limited to linear and nonlinear least squares, MLE, and QMLE. The theoretical results show that for two nonnested competing models, the asymptotic property of the weighted tests statistics are not a function of the competing estimators but observations and has normal distribution. An interesting finding is that even in case of exogenous stratification, we cannot drop weights in the tests statistics since for nonnested tests both competing models should be misspecified under the null. We also apply results in two empirical studies.
Show less
 Title
 Studying the effects of sampling on the efficiency and accuracy of kmer indexes
 Creator
 Almutairy, Meznah
 Date
 2017
 Collection
 Electronic Theses & Dissertations
 Description

"Searching for local alignments is a critical step in many bioinformatics applications and pipelines. This search process is often sped up by finding shared exact matches of a minimum length. Depending on the application, the shared exact matches are extended to maximal exact matches, and these are often extended further to local alignments by allowing mismatches and/or gaps. In this dissertation, we focus on searching for all maximal exact matches (MEMs) and all highly similar local...
Show more"Searching for local alignments is a critical step in many bioinformatics applications and pipelines. This search process is often sped up by finding shared exact matches of a minimum length. Depending on the application, the shared exact matches are extended to maximal exact matches, and these are often extended further to local alignments by allowing mismatches and/or gaps. In this dissertation, we focus on searching for all maximal exact matches (MEMs) and all highly similar local alignments (HSLAs) between a query sequence and a database of sequences. We focus on finding MEMs and HSLAs over nucleotide sequences. One of the most common ways to search for all MEMs and HSLAs is to use a kmer index such as BLAST. A major problem with kmer indexes is the space required to store the lists of all occurrences of all kmers in the database. One method for reducing the space needed, and also query time, is sampling where only some kmer occurrences are stored. We classify sampling strategies used to create kmer indexes in two ways: how they choose kmers and how many kmers they choose. The kmers can be chosen in two ways: fixed sampling and minimizer sampling. A sampling method might select enough kmers such that the kmer index reaches full accuracy. We refer to this sampling as hard sampling. Alternatively, a sampling method might select fewer kmers to reduce the index size even further but the index does not guarantee full accuracy. We refer to this sampling as soft sampling. In the current literature, no systematic study has been done to compare the different sampling methods and their relative benefits/weakness. It is well known that fixed sampling will produce a smaller index, typically by roughly a factor of two, whereas it is generally assumed that minimizer sampling will produce faster query times since query kmers can also be sampled. However, no direct comparison of fixed and minimizer sampling has been performed to verify these assumptions. Also, most previous work uses hard sampling, in which all similar sequences are guaranteed to be found. In contrast, we study soft sampling, which further reduces the kmer index at a cost of decreasing query accuracy. We systematically compare fixed and minimizer sampling to find all MEMs between large genomes such as the human genome and the mouse genome. We also study soft sampling to find all HSLAs using the NCBI BLAST tool with the human genome and human ESTs. We use BLAST, since it is the most widely used tool to search for HSLAs. We compared the sampling methods with respect to index size, query time, and query accuracy. We reach the following conclusions. First, using larger kmers reduces query time for both fixed sampling and minimizer sampling at a cost of requiring more space. If we use the same kmer size for both methods, fixed sampling requires typically half as much space whereas minimizer sampling processes queries slightly faster. If we are allowed to use any kmer size for each method, then we can choose a kmer size such that fixed sampling both uses less space and processes queries faster than minimizer sampling. When identifying HSLAs, we find that soft sampling significantly reduces both index size and query time with relatively small losses in query accuracy. The results demonstrate that soft sampling is a simple but effective strategy for performing efficient searches for HSLAs. We also provide a new model for sampling with BLAST that predicts empirical retention rates with reasonable accuracy."Pages iiiii.
Show less
 Title
 The value of imperfect sample separation information in switching regression models
 Creator
 Masson, Edwina A.
 Date
 1985
 Collection
 Electronic Theses & Dissertations
 Title
 An investigation of the power function for the test of independence in 2 x 2 contingency tables
 Creator
 Harkness, William Leonard, 1934
 Date
 1959
 Collection
 Electronic Theses & Dissertations
 Title
 Resampling methods for linear models
 Creator
 Podgórski, Krzysztof
 Date
 1993
 Collection
 Electronic Theses & Dissertations
 Title
 First order allocation
 Creator
 Noble, William
 Date
 1991
 Collection
 Electronic Theses & Dissertations