You are here
Search results
(1 - 20 of 20)
- Title
- Defining the role of ballast water in the transport of viruses in aquatic environments through metagenomic approaches
- Creator
- Kim, Yiseul
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Global shipping activities transport 12 billion tons of water across regions each year. This so called ballast water contains a variety of biological materials and has been considered to transfer non−native species between biomes, resulting in potential ecological, economic, and public health problems in major ports worldwide. Despite the large amount of ballast water transported around the globe and its negative impact on native ecosystems, relatively little attention has been paid to viral...
Show moreGlobal shipping activities transport 12 billion tons of water across regions each year. This so called ballast water contains a variety of biological materials and has been considered to transfer non−native species between biomes, resulting in potential ecological, economic, and public health problems in major ports worldwide. Despite the large amount of ballast water transported around the globe and its negative impact on native ecosystems, relatively little attention has been paid to viral invasions via ballast water due to technical challenges in detecting the wide range of viruses. The limitations of virus discovery using traditional approaches can now be overcome with the emergence of metagenomics, which enables unprecedented views of viral diversity and functions. This dissertation brought together environmental virology, metagenomics, and bioinformatics for the first time in order to examine taxonomic composition and diversity of viruses in ballast and harbor waters collected from a freshwater system, and to investigate global transport of viruses through ballast water and effect of engineered, management, and environmental parameters associated with ballast water on ocean viruses. Viral communities in ballast water in the Great Lakes were examined due to the long history of non−native species invasions in this region of the world. Five ballast and three harbor waters were collected from the Port of Duluth−Superior on May 2013. Bioinformatics analyses of over 550 million Illumina reads showed that the viral sequences had mostly no homologs in the public database, indicating that our knowledge about viral diversity is still very limited. Among the sequences homologous to known viruses (22.3 ± 6.2%), ballast and harbor waters contained a diversity of viruses, which were largely dominated by double–stranded (ds) DNA phages, including Myoviridae, Podoviridae, and Siphoviridae. Along with these phage families, viruses that could infect a broad range of hosts, including archaea, fungi, invertebrate, plant, protist, and vertebrate, some of which are highly pathogenic to fish and shrimp, were present at different levels in the viral metagenomes (viromes). Comparative virome analyses showed that viromes were distinct among the Great Lakes and formed a specific group of temperate freshwater viromes, separate from viromes associated with marine environments and engineered freshwater systems.The scope of this research was expanded to examine viral communities in marine environments. Sixteen ballast and eight harbor waters were collected from the Port of Los Angeles/Long Beach and the Port of Singapore from March through May 2014. Bioinformatics analyses of 3.8 billion Illumina reads revealed that taxonomic profile of the sequences homologous to known viruses (30.6 ± 0.03%) was similar to that observed in the Great Lakes viromes, which were largely dominated by dsDNA phages. Moreover, this research was able to detect sequences most similar to viruses infecting human, fish, and shrimp, which are related to significant public health problems or direct economic impact. Variations in virome composition of ballast and harbor waters were found between geographic locations, suggesting that the movement of ballast water across the global shipping network transports the ocean viromes. Importantly, this research showed that virus richness (type of viruses) in ballast water was not governed by engineered or management variables but by conditions of local environment where viruses arise from showing associations with latittude. Outcomes of the present research represent the most detailed characterization to date of viruses in ballast water, defining the role of ballast water in the transport of freshwater and ocean viromes and an increased risk of exposure of aquatic fauna and flora to viruses. The present findings emphasize the need for implementing ballast water discharge limits for viruses and treatment. More research is needed on host population structure to better understand the impact of the transport of viruses between biomes.
Show less
- Title
- Determining the role of IRF6 in T cell development and functional commitment
- Creator
- Mansour, Tamer Ahmed
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
DETERMINING THE ROLE OF IRF6 IN T CELL DEVELOPMENT AND FUNCTIONAL COMMITMENT Interferon regulatory factor (IRF) is a protein family with nine members in mammals known to orchestrate and control homeostatic mechanisms of host defense. There are functional and/or developmental defects of immune cells in the knockouts of eight family members. Like other family members, IRF6 is involved in regulating the cell cycle but in keratinocytes and mammary epithelial cell with mutations...
Show moreDETERMINING THE ROLE OF IRF6 IN T CELL DEVELOPMENT AND FUNCTIONAL COMMITMENT Interferon regulatory factor (IRF) is a protein family with nine members in mammals known to orchestrate and control homeostatic mechanisms of host defense. There are functional and/or developmental defects of immune cells in the knockouts of eight family members. Like other family members, IRF6 is involved in regulating the cell cycle but in keratinocytes and mammary epithelial cell with mutations associated with squamous cell carcinomas. However, Irf6 is the only IRF known to be involved in morphogenesis. In humans, rare variants in IRF6 cause autosomal dominant orofacial clefting disorders. Common IRF6 variants contribute risk toward non-syndromic orofacial clefting. IRF6 is the only IRF family member with an as yet undetermined role in immunity. Here, we used publically available microarray data to uncover a dynamic expression pattern for Irf6 during hematopoietic development and functional commitment. We found that Irf6 is expressed early in hematopoiesis, especially in long term hematopoietic stem cells. Also we identified Irf6 expression in T cell lineage, including developing and functionally committed stages. Irf1, 2, 4, 8 are indispensable for a normal T cell development and differentiation. Genetic variants in IRF5, IRF7 and IRF8 are associated to autoimmune disorders of T cells including psoriasis, multiple sclerosis and systemic lupus erythematosis. Furthermore, protein complexes between IRF6/IRF5 and IRF6/IRF8 were described. These data together with DNA conservation among the IRF members and structural homology with IRF5 strongly suggests a role for Irf6 in the immune system, specifically in T-cell development and functional commitment. We utilized a mouse model to show that Irf6 was required for the regulation of thymocyte development. We found that Irf6 was expressed in the subcapsular region and medulla of the thymus. We further found that Irf6 regulated the distribution and proliferation of developing thymocytes. In addition, loss of Irf6 led to an increase in double negative cells with a concomitant increase in TCRγδ. Loss of Irf6 also led to a reduction in double positive cells with no corresponding reduction in single positive cell maturation. Also, we found that Irf6 dose is critical in development of both CD4+ and CD8+ cells in an age-dependent manner. These data suggest a novel gene function for Irf6 in thymocyte development and indicate further studies of IRF6 variants that might increase the risk of autoimmune disease. In the mouse, loss of Irf6 leads to perinatal lethality which hinders the ability to test the necessity of Irf6 in the functionally committed T helper (Th) subsets. We use a combination of in silico, in vivo and in vitro assays to determine the role of Irf6 in T cell differentiation. Using in silico analysis, we found and propose a model for Irf6 function in Th17/Treg balance. To test our hypothesis in vivo and overcome perinatal lethality, we employed an adaptive transfer of Irf6 knockout cells into lethally irradiated mice. We observed a 100% survival of chimeric mice receiving Irf6 knockout fetal liver, and mice receiving Irf6 knockout cells had no deficit in restoration of lymphocyte production. In addition, we used two in vitro models to assess the necessity of Irf6 in the commitment of T helper cells. Using a stromal-free culture we found that naive T cells lacking Irf6 could be differentiated into Th1, Th2, Th17 and Treg using a specific cytokine cocktail. We found no differences in cell frequency and mean fluorescence intensity of intracellular cytokines between wild type and Irf6 knockout cells. In vitro differentiation of dendritic cells showed significant increase of MHC-II expression after three days of culture. Irf6 might be involved in post-translational regulation of MHC-II. These data indicate that intrinsic Irf6 expression is not essential for T helper subset differentiation. However, a non-cell autonomous role for Irf6 in T cell differentiation through dendritic cells remains plausible.
Show less
- Title
- Understanding the role of standing genetic variation in functional genetics and compensatory evolution
- Creator
- Chari, Sudarshan R.
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Conventionally the phenotypic outcome of a mutation is considered to be due to a specific DNA lesion. But it has long been known that mutational effects can be conditional on environment (GxE) and genetic background (GxG). Thus it is standard practice to perform experiments by controlling for rearing environment and using co-isogenic strains. Though such a controlled approach has been very successful in enabling many discoveries, by not considering conditional effects our understanding of...
Show moreConventionally the phenotypic outcome of a mutation is considered to be due to a specific DNA lesion. But it has long been known that mutational effects can be conditional on environment (GxE) and genetic background (GxG). Thus it is standard practice to perform experiments by controlling for rearing environment and using co-isogenic strains. Though such a controlled approach has been very successful in enabling many discoveries, by not considering conditional effects our understanding of biological systems is incomplete. My research utilized conditionality in terms of genetic background and standing genetic variation therein to understand whether mutational interactions can themselves be background dependent. I demonstrated that a majority of mutational interactions identified via a dominant modifier screen are background dependent. Extending this idea of contingency in terms of standing genetic variation to the phenomenon of compensatory evolution in the presence of deleterious mutations, I demonstrated that natural populations of Drosophila melanogaster possess standing genetic variation for compensatory alleles to ameliorate even severe phenotypic defects. I further demonstrated that, despite considerable standing variation to ameliorate the focal phenotype perturbed by the mutation, natural selection exploits alternative evolutionary trajectories to recover fitness. Additionally this model system also allowed me to understand that loss of sexual signaling can be compensated by modulating behavioural and life history traits.
Show less
- Title
- Defending against browser based data exfiltration attacks
- Creator
- Sood, Aditya
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
The global nature of Internet has revolutionized cultural and commercial interactions while at the same time it has provided opportunities for cyber criminals. Crimeware services now exist that have transformed the nature of cyber crime by making it more automated and robust. Furthermore, these crimeware services are sold as a part of a growing underground economy. This underground economy has provided a financial incentive to create and market more sophisticated crimeware. Botnets have...
Show moreThe global nature of Internet has revolutionized cultural and commercial interactions while at the same time it has provided opportunities for cyber criminals. Crimeware services now exist that have transformed the nature of cyber crime by making it more automated and robust. Furthermore, these crimeware services are sold as a part of a growing underground economy. This underground economy has provided a financial incentive to create and market more sophisticated crimeware. Botnets have evolved to become the primary, automated crimeware. The current, third generation of botnets targets online financial institutions across the globe. Willie Sutton, the bank robber, when asked why he robbed banks is credited with replying: "That is where the money is." Today, financial institutions are online so "that is where the money is" and criminals are swarming. Because the browser is most people's window to the Internet, it has become the primary target of crimeware, bots in particular. A common task is to steal credentials for financial institutions such as accounts and passwords.Our goal is to prevent browser-based data exfiltration attacks. Currently bots use a variant of the Man-in-the-Middle attack known as the Man-in-the-Browser attack for data exfiltration. The two most widely deployed browser-based data exfiltration attacks are Form-grabbing and Web Injects. Form-grabbing is used to steal data such as credentials in web forms while the Web Injects attack is used to coerce the user to provide supplemental information such as a Social Security Number (SSN). Current security techniques emphasize detection of malware. We take the opposite approach and assume that clients are infected with malware and then work to thwart their attack. This thesis makes the following contributions:We introduce WPSeal, a method that a financial institution can use to discover that a Web-inject attack is happening so an account can be shut down before any damage occurs. This technique is done entirely on the server side (such as the financial institution's side).We developed a technique to encrypt form data, rendering it useless for theft. This technique is controlled from the server side (such as the financial institution's side). Using WPSeal, we can detect if the encryption scheme has been tampered with.We present an argument that current hooking-based capabilities of bots cannot circumvent WPSeal (as well as the encryption that WPSeal protects). That is, criminals will have to come up with a totally different class of attack.In both cases, we do not prevent the attack. Instead, we detect the attack before damage can be done, rendering the attack harmless.
Show less
- Title
- Identification and analysis of non-coding RNAs in large scale genomic data
- Creator
- Achawanantakun, Rujira
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
The high-throughput sequencing technologies have created the opportunity of large-scale transcriptome analyses and intensify attention on the study of non-coding RNAs (ncRNAs). NcRNAs pay important roles in many cellular processes. For example, transfer RNAs and ribosomal RNAs are involved in protein translation process; micro RNAs regulate gene expression; long ncRNAs are found to associate with many human diseases ranging from autism to cancer.Many ncRNAs function through both their...
Show moreThe high-throughput sequencing technologies have created the opportunity of large-scale transcriptome analyses and intensify attention on the study of non-coding RNAs (ncRNAs). NcRNAs pay important roles in many cellular processes. For example, transfer RNAs and ribosomal RNAs are involved in protein translation process; micro RNAs regulate gene expression; long ncRNAs are found to associate with many human diseases ranging from autism to cancer.Many ncRNAs function through both their sequences and secondary structures. Thus, accurate secondary structure prediction provides important information to understand the tertiary structures and thus the functions of ncRNAs.The state-of-the-art ncRNA identification tools are mainly based on two approaches. The first approach is a comparative structure analysis, which determines the consensus structure from homologous ncRNAs. Structure prediction is a costly process, because the size of the putative structures increases exponentially with the sequence length. Thus it is not practical for very long ncRNAs such as lncRNAs. The accuracy of current structure prediction tools is still not satisfactory, especially on sequences containing pseudoknots. An alternative identification approach that has been increasingly popular is sequence based expression analysis, which relies on next generation sequencing (NGS) technologies for quantifying gene expression on a genome-wide scale. The specific expression patterns are used to identify the type of ncRNAs. This method therefore is limited to ncRNAs that have medium to high expression levels and have the unique expression patterns that are different from other ncRNAs. In this work, we address the challenges presented in ncRNA identification using different approaches. To be specific, we have proposed four tools, grammar-string based alignment, KnotShape, KnotStructure, and lncRNA-ID. Grammar-string is a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar and a full RNA grammar including pseudoknots. It simplifies a complicated structure alignment to a simple grammar string-based alignment. Also, grammar-string-based alignment incorporates both sequence and structure into multiple sequence alignment. Thus, we can then enhance the speed of alignment and achieve an accurate consensus structure. KnotShape and KnotStructure focus on reducing the size of the structure search space to enhance the speed of a structure prediction process. KnotShape predicts the best shape by grouping similar structures together and applying SVM classification to select the best representative shape. KnotStructure improve the performance of structure prediction by using grammar-string based-alignment and the predicted shape output by KnotShape.lncRNA-ID is specially designed for lncRNA identification. It incorporates balanced random forest learning to construct a classification model to distinguish lncRNA from protein-coding sequences. The major advantage is that it can maintain a good predictive performance under the limited or imbalanced training data.
Show less
- Title
- Implementing validation procedures to study the properties of widely used statistical analysis methods of RNA sequencing experiments
- Creator
- Reeb, Pablo Daniel
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
RNA sequencing (RNA-seq) technology is being rapidly adopted as the platform of choice for transcriptomic studies. Although its major focus has been gene expression profiling, other interests, such as single nucleotide profiling, are emerging as the technology evolves. In addition, applications are being rapidly expanding in model and nonmodel organisms. The overall objective of this dissertation was to propose and implement validation procedures based on experimental data to estimate the...
Show moreRNA sequencing (RNA-seq) technology is being rapidly adopted as the platform of choice for transcriptomic studies. Although its major focus has been gene expression profiling, other interests, such as single nucleotide profiling, are emerging as the technology evolves. In addition, applications are being rapidly expanding in model and nonmodel organisms. The overall objective of this dissertation was to propose and implement validation procedures based on experimental data to estimate the properties of widely used statistical analysis methods of RNA-seq experiments. The first study evaluated differential expression methods based on count data distribution and Gaussian transformed models. Parametric simulations and plasmode datasets derived from RNA-seq experiments were generated to compare the statistical models in terms of type I error rate, power and null p-value distribution. Overall, Gaussian models presented p-values closer to nominal significance levels and a p-value distribution closer to the expected uniform distribution. Researchers using models with these properties will have less false positives when inferring differentially expresses transcripts. Additionally, the use of Gaussian transformations enables the applications of all the well-known theory of linear models for instance to account for complex experimental designs.The second study assessed the properties of dissimilarity measures for agglomerative hierarchical cluster analysis. The validation comprised dissimilarity measures based on Euclidean distance, correlation-based dissimilarities and count data-based dissimilarities. I used plasmode datasets generated from two RNA-seq experiments with different sample structures and simulated scenarios based on informative and non informative transcripts. In addition, I proposed two measures, agreement and consistency, for comparing dendrograms. Dissimilarity measures based on non-transformed data resulted in dendrograms that did not resemble the expected sample structure, whereas dissimilarities calculated with appropriate transformations for count data were consistent in reproducing the expected dendrograms under different scenarios. The third study compared variant calling programs that used reference genotypes obtained from a SNPchip. The evaluation included multiple samples and multiple tissue datasets and considered the effect of per base read depth. Sensitivity and false discovery rates were computed separately for heterozygous and homozygous sites in order to provide information for potentially different applications such as allele-specific expression or RNA-editing. Additionally, I explored the use of SNP called from RNA-seq to compute relationship matrices in population studies. Heterozygous sites with more than 10 reads per base and per sample were called with high sensitivity and low false discovery rates. Homozygous sites were called with higher sensitivity than heterozygous irrespective of depth but presented higher false discovery rates. A relationship matrix based on accurate genotypes obtained with RNA-seq presented a high correlation with a relation matrix based on genotypes from a SNPchip.In conclusion, using synthetic and reference datasets, I compared statistical models to perform differential expression analysis, sampled-base hierarchical cluster analysis, and variant calling and genotyping. This validation framework can be extended to evaluate other methods of RNA-seq analysis as well as to evaluate the periodic publication of new and updated analysis methods. Choosing the most appropriate software can help researchers to obtained better results and to achieve the goals of their investigations.
Show less
- Title
- Characterization of two large gene families in the sea lamprey
- Creator
- Chang, Steven
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
ABSTRACTCHARACTERIZATION OF TWO LARGE GENE FAMILIES IN THE SEA LAMPREYBySteven ChangThis dissertation employed molecular biology and bioinformatics to examine two large gene families in the sea lamprey, Petromyzon marinus. An integrative approach was used to define these gene families in order to ensure the validity of the size and members of each gene family. There are two chapters: Chapter 1 examines chemosensory gene expression in a specialized part of the olfactory system and Chapter 2...
Show moreABSTRACTCHARACTERIZATION OF TWO LARGE GENE FAMILIES IN THE SEA LAMPREYBySteven ChangThis dissertation employed molecular biology and bioinformatics to examine two large gene families in the sea lamprey, Petromyzon marinus. An integrative approach was used to define these gene families in order to ensure the validity of the size and members of each gene family. There are two chapters: Chapter 1 examines chemosensory gene expression in a specialized part of the olfactory system and Chapter 2 studies the expression of detoxification genes in the liver and gills in response to the lampricide, 3-trifluoromethyl-4-nitrophenol (TFM).CHEMORECEPTOR GENESFor this dissertation, I will restrict chemoreception to the detection of chemical signals in the nose (note: chemoreception includes taste), and is accomplished by detection of odorants in the environment by specialized sensory cells in the main olfactory epithelium (MOE). In certain tetrapods, a second sensory epithelium is also found in the nose, called the vomeronasal organ (VNO). Canonically, each epithelium represents the start of different olfactory pathways, which govern different behavioral responses. Each epithelium expresses different classes of chemoreceptor (CR) genes; the MOE expresses odorant receptors (ORs) and trace amine-associated receptors (TAARs), while the VNO expresses ORs, vomeronasal type-1 and type-2 receptors (V1Rs and V2Rs). The sea lamprey olfactory organ has one nostril and so has one nasal capsule, which is divided into two spatially distinct regions: the main olfactory epithelium (MOE) and the accessory olfactory organ (AOO). The MOE has been well studied in lampreys but the function of the AOO has eluded description for over 100 years. Based on other research and due to its proximity to the MOE, we hypothesized that the AOO represents an ancestral VNO. If this AOO is indeed an ancestral VNO, we expect a different connectivity to the central nervous system than from the MOE, and would expect expression of pheromone receptors (V1Rs and V2Rs). CR expression in the MOE and AOO of sea lamprey were examined. The differential expression of CR genes between the two epithelia was determined and the connectivity of the main and accessory epithelia was determined using neural tract tracing. Quantitative PCR confirmed and quantified the differential expression of specific genes in the main and accessory olfactory epithelia.CYTOCHROME P450 GENESThe second gene family to be explored is the cytochrome P450 family. P450 genes encode for steroidogenic or detoxification enzymes that are inducible by a substrate. As part of the strategy for controlling sea lamprey populations TFM is applied to streams. Very little is known at the molecular level of how TFM works to kill sea lamprey larvae, but based on responses by other organisms to xenobiotic substances, our hypothesis is that P450 genes are induced by exposure to TFM. P450 genes were predicted from the sea lamprey genome and larvae were exposed to TFM and gill and liver tissues were harvested over an 8-hour time course. Expression was confirmed using high-throughput sequencing and quantitative PCR. The immediate goal was to determine which P450 genes are induced by exposure to TFM. Alternatively, we generated a list of predicted Phase II detoxification enzymes in the event that P450 genes showed no difference in expression. The long-term goal is to use that knowledge to design more efficient and specific lampricides.
Show less
- Title
- Identifying the genetic basis of attenuation in Marek's disease virus via experimental evolution
- Creator
- Hildebrandt, Evin
- Date
- 2014
- Collection
- Electronic Theses & Dissertations
- Description
-
Marek's disease virus (MDV), an oncogenic alphaherpesvirus of chickens, causes up to $2 billion in loses a year due to Marek's disease (MD). Therefore control of this economically important disease is critical. The primary method to control MD is vaccination. Attenuated, or weakened, strains of MDV have been generated via repeated in vitro serial passage to generate avirulent MDV strains that have been used as successful MD vaccines. Despite introduction of several vaccines since the 1970's,...
Show moreMarek's disease virus (MDV), an oncogenic alphaherpesvirus of chickens, causes up to $2 billion in loses a year due to Marek's disease (MD). Therefore control of this economically important disease is critical. The primary method to control MD is vaccination. Attenuated, or weakened, strains of MDV have been generated via repeated in vitro serial passage to generate avirulent MDV strains that have been used as successful MD vaccines. Despite introduction of several vaccines since the 1970's, more virulent strains of MDV have evolved to break vaccinal protection. Therefore, development of new MD vaccines is necessary. To address this concern, we sought to better understand the molecular basis of attenuation in MDV to provide information that may assist in the rationale design of MD vaccines. Three attenuated replicates of a virulent MDV were serially passed in vitro for over 100 passages. DNA and RNA from attenuated viruses were deep sequenced using Illumina next-generation sequencers to identify changes in DNA sequence or expression following attenuation. Top candidate mutations identified via sequencing were used to generate seven recombinant viruses using red-mediated recombineering for mutations within UL42, UL46, UL5, two involving LORF2 and two mutations within ICP4. These recombinant viruses were tested in vivo to determine the impact of these mutations on MD incidence, in vivo replication and horizontal transmission. Point mutations within UL42, UL46, LORF2-Promoter and ICP4 did not cause observable phenotypic changes compared to the parental virus. A single point mutation within LORF2-Intron and a double mutant involving ICP4 both resulting in 100% MD in challenged birds but failed to transmit horizontally to uninfected contact birds. Finally, a point mutation within UL5 reduced MD incidence by over 90%, significantly reduced in vivo replication, and eliminated horizontal transmission. Further characterization of this UL5 point mutation determined that it increased in vitro replication in growth curves, yet head-to-head competition of the Mut UL5 virus versus parental virus showed the parental virus outcompeted the mutant virus. Furthermore, serial passage of Mut UL5 in vivo did not result in increased in MD incidence, in vivo replication or result in reversion or compensatory mutations to UL5 after passage through birds. Trials testing vaccinal protection of the Mut UL5 virus showed the virus provided partial protection against challenge with virulent MDV, yet did not exceed protection achieved through use of traditional vaccines. Therefore, use of this point mutation in combination with other candidate mutations was tested. Addition of the UL5 mutation with Delta Meq, a candidate vaccine with high protection and replication but also induces bursal-thymic atrophy (BTA), resulted in a recombinant virus that replicated at low levels and did not cause BTA, yet reduced levels of vaccinal protection, indicating an intricate relationship between replication levels, BTA and vaccinal protection. This study shows that a variety of genes are mutated during attenuation, and particularly mutations within DNA replication genes, such as UL5, appear to play an important role in attenuation. We also determined that experimental evolution is a process that not only can identify mutations involved in attenuation, but also offer protection as a vaccine to provide information for further development of MD vaccines.
Show less
- Title
- Understanding the mechanisms of oncogenicity by Marek's disease virus : role of Meq oncoprotein
- Creator
- Subramaniam, Sugalesini
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
Marek's disease (MD) is one of the most economically significant diseases in chickens. It is caused by a highly oncogenic, alpha-herpesvirus named Marek's disease virus (MDV). Currently, the main strategy to control MD is vaccination. However, accumulating evidence points to increase in virulence among MDV field isolates over time, which implicates that new strains of the virus are evolving and could break vaccine protection. This necessitates better understanding of MDV-host interactions,...
Show moreMarek's disease (MD) is one of the most economically significant diseases in chickens. It is caused by a highly oncogenic, alpha-herpesvirus named Marek's disease virus (MDV). Currently, the main strategy to control MD is vaccination. However, accumulating evidence points to increase in virulence among MDV field isolates over time, which implicates that new strains of the virus are evolving and could break vaccine protection. This necessitates better understanding of MDV-host interactions, not only to elucidate the events in pathogenesis but also develop strategies for newer and more effective vaccines. One of the major unanswered questions in this area is the mechanism of tumor formation by MDV. The main objective of this project is to gain a comprehensive understanding of host genes that are transcriptionally regulated by Meq, the major oncoprotein of MDV and their relevance in genetic resistance to MD. MDV oncogenicity is largely attributed to the bZIP transcription factor Meq. Although it was discovered in the 1990s, only a few of host target genes have been described. This knowledge gap has impeded our understanding of Meq-induced tumorigenesis. Using a combination of state-of-the-art genomic techniques including ChIP-Seq and microarray analysis, a high confidence list of Meq binding sites and a global transcriptome of genes regulated by Meq was generated. Given the importance of Meq in MDV pathogenesis, we next explored the role of Meq in genetic resistance to MD. Two highly inbred chicken lines, varying in MD resistance, were infected with a virulent strain of MDV, Md5 or a mutant virus lacking Meq, Md-deltaMeq. Analysis of differentially expressed genes provided a list of Meq-dependent genes that are involved in MD resistance and susceptibility. Pathway analysis indicated that MD resistant lines were enriched for positive regulation of cell death whereas the susceptible cell lines were enriched for regulation of cell proliferation. In addition, some of the Meq-regulated pathways like ERK/MAPK signaling and Jak-STAT pathways were also involved in differential MD susceptibility. Taken together, our study provides a comprehensive analysis of how Meq interacts with cellular pathways involved in oncogenesis. In addition, this study forms the basis for selection of candidate genes that might be involved in genetic resistance to Marek's disease.
Show less
- Title
- Fast NCRNA identification techniques
- Creator
- Takyar, Seyedeh Shohreh
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling...
Show moreMany noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods.In this thesis, we introduce grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG). Being a string defined on a special alphabet constructed from a CFG, it converts ncRNA alignment into sequence alignment with n square complexity. We explain how this representation is used in derivation of consensus secondary structure through multiple ncRNA alignment and also how existing clustering methods could be applied to ncRNAs represented by this model.
Show less
- Title
- Quantitative analysis of enhancer function in the dorsal-ventral patterning gene network of the Drosophila embryo
- Creator
- Sayal, Rupinder
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Enhancers are non-coding regions of DNA that coordinate spatio-temporal regulation of gene expression. These regulatory sequences contain binding sites for sequence-specific transcription factors. Enhancers are instrumental in evolution of novel developmental and morphological features, as well as quantitative expression differences in a given population. In order to understand enhancer function and develop generalizable, predictive models for enhancers, it is important to study how enhancers...
Show moreEnhancers are non-coding regions of DNA that coordinate spatio-temporal regulation of gene expression. These regulatory sequences contain binding sites for sequence-specific transcription factors. Enhancers are instrumental in evolution of novel developmental and morphological features, as well as quantitative expression differences in a given population. In order to understand enhancer function and develop generalizable, predictive models for enhancers, it is important to study how enhancers function at a quantitative level. I first developed a suite of reporter gene vectors, which made quantitative measurements of gene expression from enhancer feasible. This "pHonda" suite of vectors is designed for site-specific integration into fly genome to eradicate position effects, as well as it uses a specific 5'-UTR, which allows for a more diffuse distribution of mRNA, making it more amenable to quantitative studies.Dorsal-ventral patterning is regulated by a master transcription factor, Dorsal, which is the fly homolog of mammalian NF-kappaB protein. Dorsal regulates about 100 genes in early fly embryo and coordinates dorsal-ventral patterning. A number of enhancers for these genes have been identified, which have been found to contain binding sites for the above proteins. The availability of several tested enhancer sequences, and quantitative data for concentrations of thesefactors, make it a suitable system for carrying out quantitative studies. Using systematic mutagenesis and confocal microscopy I first generated a systematic perturbation dataset for enhancer of rhomboid gene. Next, I applied thermodynamic modeling to this dataset, which uses assumptions of statistical thermodynamics to derive gene expression as a function of probabilities of binding of different factors to enhancer sequences, and tested several models for protein cooperativity and repression on this dataset. Subsequently, I used these models to predict gene expression from enhancer sequences, which were not used for modeling, and found that the top-ranked models can predict gene expression from these sequences in a tissue-specific manner. My study highlights the importance of mathematical modeling to understand the general rules of enhancer function.
Show less
- Title
- Genome evolution of Campylobacter jejuni during experimental adaptation
- Creator
- Jerome, John Paul
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Campylobacter jejuni is a leading cause of foodborne bacterial enteritis in humans. An important reservoir forC. jejuni is in chickens, but it has been shown to colonize a large host range. Passage through a mouse model of campylobacteriosis resulted in a hypervirulent phenotype in mice forC. jejuni strain NCTC11168. After analyzing the wild-type and mouse-adapted variants by phenotype assays, expression microarray, pulse-field gel...
Show moreCampylobacter jejuni is a leading cause of foodborne bacterial enteritis in humans. An important reservoir forC. jejuni is in chickens, but it has been shown to colonize a large host range. Passage through a mouse model of campylobacteriosis resulted in a hypervirulent phenotype in mice forC. jejuni strain NCTC11168. After analyzing the wild-type and mouse-adapted variants by phenotype assays, expression microarray, pulse-field gel electrophoresis and whole genome sequencing we discovered that the genetic changes in the mouse-adapted variant were confined to thirteen hypermutable regions of DNA in contingency loci. We also show that specific contingency loci changes occurred in parallel during mouse infection when reisolates from multiple mice were analyzed. Furthermore, a mathematical model that considers contingency loci mutation rates and patterns does not explain the observed changes. Taken together, this is the first experimental evidence that contingency loci play a role in the rapid genetic adaptation ofC. jejuni to a host, which results in increased virulence. In contrast to the observed virulence increase by serial host passage, we showed thatC. jejuni rapidly loses an essential host colonization determinant during adaptive laboratory evolution. Passage in broth culture selected for flagellar motility deficientC. jejuni cells in parallel for five independently evolved lines. Moreover, the loss of motility occurred by two genetic mechanisms: reversible and irreversible. Reversible loss of motility occurred early during broth adaptation, followed by irreversible motility loss in the majority of cells by the end of the experiment. Whole genome sequencing implicated diverse mutation events that resulted in the loss of gene expression necessary for flagellar biosynthesis. Furthermore, reversible mutations in homopolymeric DNA tracts of adenine/thymine residues, and irreversible types of mutation such as gene deletion, were discovered in the broth-evolved populations. In all evolved lines, an alternative sigma factor necessary for flagellar structural gene expression was removed from the genome. Overall, this dissertation contains the first accounts ofC. jejuni experimental evolution. The results provide insight into the biological importance of reversible mutations in homopolymeric DNA tracts, and provide a basis for future studies ofC. jejuni evolvability.
Show less
- Title
- Profile HMM-based protein domain analysis of next-generation sequencing data
- Creator
- Zhang, Yuan
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
Sequence analysis is the process of analyzing DNA, RNA or peptide sequences using a wide range of methodologies in order to understand their functions, structures or evolution history. Next generation sequencing (NGS) technologies generate large-scale sequence data of high coverage and nucleotide level resolution at low costs, benefiting a variety of research areas such as gene expression profiling, metagenomic annotation, ncRNA identification, etc. Therefore, functional analysis of NGS...
Show moreSequence analysis is the process of analyzing DNA, RNA or peptide sequences using a wide range of methodologies in order to understand their functions, structures or evolution history. Next generation sequencing (NGS) technologies generate large-scale sequence data of high coverage and nucleotide level resolution at low costs, benefiting a variety of research areas such as gene expression profiling, metagenomic annotation, ncRNA identification, etc. Therefore, functional analysis of NGS sequences becomes increasingly important because it provides insightful information, such as gene expression, protein composition, and phylogenetic complexity, of the species from which the sequences are generated. One basic step during the functional analysis is to classify genomic sequences into different functional categories, such as protein families or protein domains (or domains for short), which are independent functional units in a majority of annotated protein sequences. The state-of-the-art method for protein domain analysis is based on comparative sequence analysis, which classifies query sequences into annotated protein or domain databases. There are two types of domain analysis methods, pairwise alignment and profile-based similarity search. The first one uses pairwise alignment tools such as BLAST to search query genomic sequences against reference protein sequences in databases such as NCBI-nr. The second one uses profile HMM-based tools such as HMMER to classify query sequences into annotated domain families such as Pfam. Compared to the first method, the profile HMM-based method has smaller search space and higher sensitivity with remote homolog detection. Therefore, I focus on profile HMM-based protein domain analysis.There are several challenges with protein domain analysis of NGS sequences. First, sequences generated by some NGS platforms such as pyrosequencing have relatively high error rates, making it difficult to classify the sequences into their native domain families. Second, existing protein domain analysis tools have low sensitivity with short query sequences and poorly conserved domain families. Third, the volume of NGS data is usually very large, making it difficult to assemble short reads into longer contigs. In this work, I focus on addressing these three challenges using different methods. To be specific, we have proposed four tools, HMM-FRAME, MetaDomain, SALT, and SAT-Assembler. HMM-FRAME focuses on detecting and correcting frameshift errors in sequences generated by pyrosequencing technology, thus accurately classifying metagenomic sequences containing frameshift errors into their native protein domain families. MetaDomain and SALT are both designed for short reads generated by NGS technologies. MetaDomain uses relaxed position-specific score thresholds and alignment positions to increase the sensitivity while keeping the false positive rate at a low level. SALT combines both position-specific score thresholds and graph algorithms and achieves higher accuracy than MetaDomain. SAT-Assembler conducts targeted gene assembly from large-scale NGS data. It has smaller memory usage, higher gene coverage, and lower chimera rate compared with existing tools. Finally, I will make a conclusion on my work and briefly talk about some future work
Show less
- Title
- Gene content evolution In plant genomes : studies of whole genome duplication, intergenic transcription and expression evolution In Brassicaceae and Poaceae species
- Creator
- Moghe, Gaurav Dilip
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
Phenomena that create new genes and influence their diversification are important contributors to evolutionary novelty in living organisms. My research has focused on addressing the following questions regarding such phenomena in plants. First, what are the patterns of evolution of duplicate genes derived via whole genome duplication (WGD)? Second, do transcripts originating from intergenic regions constitute novel genes? Third, how do expression patterns of orthologous genes evolve in plants...
Show morePhenomena that create new genes and influence their diversification are important contributors to evolutionary novelty in living organisms. My research has focused on addressing the following questions regarding such phenomena in plants. First, what are the patterns of evolution of duplicate genes derived via whole genome duplication (WGD)? Second, do transcripts originating from intergenic regions constitute novel genes? Third, how do expression patterns of orthologous genes evolve in plants? I have addressed these questions using comparative genomic and transcriptomic analyses of species in the Brassicaceae and Poaceae families. To understand the evolution of WGD derived duplicate genes, we sequenced and annotated the genome of wild radish (Raphanus raphanistrum ), a Brassicaceae species which experienced a whole genome triplication (WGT) event ~24-29 million years ago. Through comparative genomic analyses of sequenced Brassicaceae species, I found that most WGT duplicate genes were lost over time. Duplicates that are still retained were found to undergo sequence and expression level divergence. Interestingly, while duplicate copies tend to diverge in expression level, one of the copies tends to maintain its original expression state in the tissue studied. Furthermore, duplicates that are retained in extant species tend to have higher expression levels, broader expression breadth, higher network connectivity and tend to be involved in functions such as transcription factor activity, stress response and development. Functional diversification of such duplicates can assist in evolution of novel characters in plants post WGD. To understand the nature of intergenic transcription, I analyzed multiple transcriptome datasets inArabidopsis thaliana as well as in species of the Poaceae family. My results suggest that plant genomes do not show any evidence of pervasive intergenic transcription. Although thousands of intergenic transcripts can be found in each species, most of these transcripts have low breadths of expression, tend not to be conserved within or between species and show a significant bias in being located very close to genes or in open chromatin regions. My results suggest that most intergenic transcripts may be associated with transcription of the neighboring genes or may be produced as a result of noisy transcription. Properties of intergenic transcripts identified in my research will be useful in distinguishing functionally relevant transcripts from noise. To understand expression evolution, I analyzed patterns of evolution of orthologous genes between Poaceae species and found that sequence divergence is strongly associated with level and breadth of expression, and very weakly with expression divergence. Both sequence and expression evolution were found to be constrained for genes involved in core biological processes such as metabolism, transcription, photosynthesis and transport. Overall, the results of this research are broadly applicable to the field of gene annotation and increase our understanding of evolution of gene content in plant genomes.
Show less
- Title
- Identification of genes and their regulation that determine a phenotype : a systematic approach
- Creator
- Wu, Ming
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Research in computational systems biology focuses on establishing the complex relationship and interactions between genes and how they work together to render a particular phenotype. This involves the development and application of systematic approaches to study the biological regulation in the context of a network in which genes are regulating each other. Our research aim to develop novel approaches to identify genes and their regulation that determine a phenotype, which involves the reverse...
Show moreResearch in computational systems biology focuses on establishing the complex relationship and interactions between genes and how they work together to render a particular phenotype. This involves the development and application of systematic approaches to study the biological regulation in the context of a network in which genes are regulating each other. Our research aim to develop novel approaches to identify genes and their regulation that determine a phenotype, which involves the reverse engineering of regulatory mechanisms through identification of condition specific genes and interactions, as well as the systematical modeling and simulation to reconstruct context dependent regulatory networks.Chapter 1 introduces the fundamental approaches in systems biology. Data mining techniques have been developed to identify genes and interactions from gene expression data, while systems modeling integrate current knowledge to develop a functional context to address the complexity that arises in biological systems. We provide examples to demonstrate the practical aspects and biological relevance of the methodologies. Chapter 2 introduces and discusses the multi-layer approach that is able to reconstruct condition-specific genes and their regulation through an integrative analysis of large scale information of gene expression, protein interaction and transcriptional regulation. In Chapter 3 we explore a dynamic feature of gene network: the switch-like behavior, wherein we show that gene switches have specific pattern of gene expression which can be uncover by mining microarray data. This study demonstrates that one can capitalize on genome-wide expression profiling to capture dynamic properties of a complex network, thereby predicting gene switches that could be important for a phenotype and can participate in cell fate decision. In Chapter 4 the cancer phenotype is studied using systems modeling of the human metabolic network. We develop a novel approach to simulate context dependent metabolic states that upon perturbation of the gene(s) that modulate metabolic functions, can determine whether the gene is involved in conferring a phenotype. The approach is then applied to predict therapeutic microRNAs for human hepatocellular cancer. Chapter 5 provides a brief summary of the implications of the research towards a systematic understanding of gene network as well as a future perspective of the field.
Show less
- Title
- Analysis of cooperative transcription factor binding at the sequence level
- Creator
- Clifford, Jacob
- Date
- 2015
- Collection
- Electronic Theses & Dissertations
- Description
-
Transcription Factor binding to DNA binding sites is one of the primary causes of generegulation. A common representation of transcription factor binding sites is at the DNAsequence level, partly due to reoccurring patterns at the sequence level that occur throughoutthe genome for a given factor. The first chapter of this dissertation introduces gene regulationfrom the perspective of development. In addition the mathematical-physics foundation forperforming calculations and for...
Show moreTranscription Factor binding to DNA binding sites is one of the primary causes of generegulation. A common representation of transcription factor binding sites is at the DNAsequence level, partly due to reoccurring patterns at the sequence level that occur throughoutthe genome for a given factor. The first chapter of this dissertation introduces gene regulationfrom the perspective of development. In addition the mathematical-physics foundation forperforming calculations and for representations of the transcription factor binding sites atthe sequence level is discussed in Chapter 1. In Chapter 2 I explore the possibility thattwo distinct sub-types of binding sites may co-exist within a population of functional sites.This leads to a model that can be used for prediction of transcription factor binding sites.In Chapter 3 I explore modelling of Dorsal Ventral early development Gene RegulatoryNetwork, using the tools built up in Chapter 1 and 2, namely 'Position Weight Matrices'that allow for prediction of binding energies for genomic segments of DNA.
Show less
- Title
- Inferring regulatory interactions in transcriptional regulatory networks
- Creator
- Mahmoud, Sherine Awad
- Date
- 2013
- Collection
- Electronic Theses & Dissertations
- Description
-
Living cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the expression of target genes is a challenging task, especially when multiple TFs collaboratively participate in the transcriptional regulation....
Show moreLiving cells are realized by complex gene expression programs that are moderated by regulatory proteins called transcription factors (TFs). The TFs control the differential expression of target genes in the context of transcriptional regulatory networks (TRNs), either individually or in groups. Deciphering the mechanisms of how the TFs control the expression of target genes is a challenging task, especially when multiple TFs collaboratively participate in the transcriptional regulation. Recent developments in biotechnology have been applied to uncover TF-target binding relationships to reconstruct draft regulatory circuits at a systems level. Furthermore, to identify regulatory interactions in vivo and consequently reveal their functions, TF single/double knockouts and over-expression experiments have been systematically carried out. However, the results of many single or even double-knockout experiments are often non-conclusive, since many genes are regulated by multiple TFs with complementary functions. To predict the TF combinations that the knocking out of them are most likely to bring about the phenotypic change, we developed a new computational tool called TRIM that models the interactions between the TFs and the target genes in terms of both the TF-target interaction's function (activation or repression) and its corresponding logical role (necessary and/or sufficient). We used DNA-protein binding and gene expression data to construct regulatory modules for inferring the transcriptional regulatory interaction models for the TFs and their corresponding target genes. Our TRIM algorithm is based on an HMM and a set of constraints that relate gene expression patterns to regulatory interaction models. However, TRIM infers up to 2-TFs interactions. Inferring the collaborative interactions of multiple TFs is a computationally difficult task, because when multiple TFs simultaneously or sequentially control their target genes, a single gene responds to merged inputs, resulting in complex gene expression patterns. We developed mTRIM to solve this problem with a modified association rule mining approach. mTRIM is a novel method to infer TF collaborationsin transcriptional regulation networks. It can not only identify TF groups that regulate thecommon targets collaboratively but also TFs with complementary functions. However, mTRIM ignores the effect of miRNAs on target genes. In order to take miRNAs' effect into considerations, we developed a new computational model called TmiRNA that incorporates miRNAs into the inference. TmiRNA infers the interactions between a set of regulators including both TFs and miRNAs and the set of their target genes. We used our model to study the combinatorial code of Human Cancer transcriptional regulation.
Show less
- Title
- Uncovering hidden patterns of molecular recognition
- Creator
- Raschka, Sebastian
- Date
- 2017
- Collection
- Electronic Theses & Dissertations
- Description
-
"It happened in 1958 that John Kendrew's group determined the three-dimensional structure of myoglobin at a resolution of 6 Å. This first view of a protein fold was a breakthrough at that time. Now, more than half a century later, both experimental and computational techniques have substantially improved as well as our understanding of how proteins and ligands interact. Yet, there are many unanswered questions to be addressed and patterns to be uncovered. One of the most pressing needs in...
Show more"It happened in 1958 that John Kendrew's group determined the three-dimensional structure of myoglobin at a resolution of 6 Å. This first view of a protein fold was a breakthrough at that time. Now, more than half a century later, both experimental and computational techniques have substantially improved as well as our understanding of how proteins and ligands interact. Yet, there are many unanswered questions to be addressed and patterns to be uncovered. One of the most pressing needs in structural biology is the prediction of protein-ligand complexes in aiding inhibitor and drug discovery, ligand design, and studies of catalytic mechanisms. Throughout the past few decades, improvements in computational technologies and insights from experimental data have converged into numerous protein-ligand docking and scoring algorithms. However, these methods are still far from being perfect, and only minimal improvements have been made in the past few years. That might be because current scoring functions regard individual intermolecular interactions as independent events in a binding interface. This thesis addresses existing shortcomings in the conventional view of protein-ligand recognition by characterizing interactions as patterns. Finding that binding rigidifies protein-ligand complexes has led to our design of a robust scoring function that predicts native protein-ligand complexes through the coupling of interactions that rigidifies the protein-ligand interface. Also, the analysis of a non-homologous set of protein-ligand complexes has revealed that binding interfaces are polarized - surprisingly, proteins donate twice as many hydrogen bonds to ligands as they accept, on average, and the opposite is true for ligands. A more in-depth analysis of atom type distributions among H-bond donor and acceptor atoms showed that the discovered trends contain surprisingly strong patterns that are also predictive of native protein-ligand binding. Both the coupling of interactions as well as the distribution of hydrogen bond patterns are currently not captured by other methods and provide new information for the prediction and design of ligands. In the absence of the protein receptor structure, our results show that data from experimental assays can be mined to identify functional group patterns on ligands that are predictive of biological activity. Additionally, we present methods to use functional group patterns to improve the success rate of ligand-based virtual screening. Applied to G protein-coupled receptor inhibitor discovery, this approach has led to the discovery of a potent inhibitor that nullifies the biological response and presents the first instance where virtual screening has been used for aquatic invasive species control. Finally, to overcome current challenges in drug discovery for protein-protein interfaces, a new method for identifying small molecules that block protein-protein interactions is presented. We developed and applied an epitope-based virtual screening workflow to find inhibitors of focal adhesion kinase interactions involved in cancer metastasis. In sum, this work presents both novel insights into the coupling among and trends in intermolecular interactions as well as methods to predict the biological activity of ligands based on patterns of functional groups. Along with the insights gained in this work, computational tools and software for measuring the rigidification that is characteristic of native protein-ligand complexes, analyzing H-bond patterns rigorously, and screening millions of small molecules in hypothesis-driven ligand discovery have been developed and are now being made available to other scientists."--Pages ii-iii.
Show less
- Title
- Enhancing automated fault discovery and analysis
- Creator
- DeMott, Jared
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Creating quality software is difficult. Likewise, offensive researchers look to penetrate quality software. Both parties benefit from a scalable bug hunting framework. Once bugs are found, an equally expensive task is debugging. To debug faults, analysts must identify statements involved in the failures and select suspicious code regions that might contain the fault. Traditionally, this tedious task is performed manually. An automated technique to locate the true source of the failure is...
Show moreCreating quality software is difficult. Likewise, offensive researchers look to penetrate quality software. Both parties benefit from a scalable bug hunting framework. Once bugs are found, an equally expensive task is debugging. To debug faults, analysts must identify statements involved in the failures and select suspicious code regions that might contain the fault. Traditionally, this tedious task is performed manually. An automated technique to locate the true source of the failure is called fault localization. The thesis of this research is that an automated process to find software bugs and quickly localize the root cause of the failure is possible by improving upon existing techniques. This research is most interested in bugs that lead to security vulnerabilities. These bugs are high value to offensive researchers, and to the typical software test engineer. In particular, memory corruption bugs characterized via an application crash is the subset of all bugs focused on in this work. Existing distributed testing frameworks do not integrate with fault localization tools. Also, existing fault localization tools fail to localize certain difficult bugs. The overall goal of this research is to: (1) Build a dynamic testing framework powerful enough to find new bugs in commercial software. (2) Integrate an existing fault localization technique into the framework that can operate on code without the requirement of having the source code or pre-generated test cases. (3) Create a novel fault localization algorithm that better operates on difficult to localize flaws. (4) Test the improvement on benchmark and real-world code. Those objectives were achieved and empirical studies were conducted to verify the goals of this research. The constructed distributed bug hunting and analysis platform is called ClusterFuzz. The enhanced fault localization process is called Execution Mining. Test results show the novel fault localization algorithm to be an important improvement, and to be more effective than prior approaches. This research also achieved ancillary goals: visualizing fault localization in a new environment; assembly basic blocks for fully compiled code. A pipeline approach to finding and categorizing bugs paves the way for future work in the areas of automated vulnerability discovery, triage, and exploitation.
Show less
- Title
- Ecological effects on the evolution of cooperative behaviors
- Creator
- Connelly, Brian Dale
- Date
- 2012
- Collection
- Electronic Theses & Dissertations
- Description
-
Cooperative behaviors abound in nature and can be observed across the spectrum of life, from humans and primates to bacteria and other microorganisms. A deeper understanding of the forces that shape cooperation can offer key insights into how groups of organisms form and co-exist, how life transitioned to multicellularity, and account for the vast diversity present in ecosystems. This knowledge lends itself to a number of applications, such as understanding animal behavior and engineering...
Show moreCooperative behaviors abound in nature and can be observed across the spectrum of life, from humans and primates to bacteria and other microorganisms. A deeper understanding of the forces that shape cooperation can offer key insights into how groups of organisms form and co-exist, how life transitioned to multicellularity, and account for the vast diversity present in ecosystems. This knowledge lends itself to a number of applications, such as understanding animal behavior and engineering cooperative multi-agent systems, and may further help provide a fundamental basis for new industrial and medical treatments targeting communities of cooperating microorganisms.Although these behaviors are common, how evolution selected for and maintained them remains a difficult question for which several theories have been introduced. These theories, such as inclusive fitness and group selection, generally focus on the fitness costs and benefits of the behavior in question, and are often invoked to examine whether a trait with some predetermined costs and benefits could be maintained as an evolutionarily-stable strategy. Populations, however, do not exist and evolve in a vacuum. The environment in which they find themselves can play a critical role in shaping the types of adaptations that organisms accumulate, since one behavior may be highly beneficial in one environment, yet a hindrance in another. Ever-changing environments further complicate this picture, as maintaining a repertoire of behaviors for surviving in different environments is often costly. In addition to these environmental forces, the number and composition of other organisms with which individuals interact impose additional constraints. The combination of these factors results in significantly more complex dynamics.Using computational models and microbial populations, this dissertation examines several ways in which ecological factors can affect the evolution of cooperative behaviors. First, environmental disturbance is examined, in which a cooperative act enables organisms and their surrounding neighbors to survive a periodic kill event (population bottleneck) of varying severity. Resource availability is then studied, where populations must determine how much resource to allocate to cooperation. Finally, the effect that social structure, which define the patterns of interactions among the individuals in a population, is investigated.
Show less