Xavier, Joana C.; Patil, Kiran Raosaheb; Rocha, I.
Fonte: Universidade do MinhoPublicador: Universidade do Minho
Tipo: Conferência ou Objeto de Conferência
Publicado em //2014ENG
Relevância na Pesquisa
The biomass objective function (BOF) is an abstractive equation used in genome-scale
constraint-based modelling (GS-CBM) to predict growth phenotypes. The BOF represents all the
growth requirements upon cell division, which stoichiometric representation is ideally
based on experimental measurements for cells growing in log phase (1). For growth rate
calculations it is sufficient to know the macromolecular content of the cell, its detailed
composition (amino acids, nucleotides and fatty acids.) and energetic costs of growth (2).
However, to examine network essentiality another level of detail is required, which includes
cofactors and ions and the analysis of which are the minimally essential biomass components
(2) often called the core biomass (3, 4). There is no defined strategy in the literature for
choosing which components are to be parts of a detailed BOF and the core BOF. In order to
obtain a universal core prokaryotic BOF, we integrated BOFs of 71 genome-scale manually curated
prokaryotic models, the ModelSEED framework for biomass composition (5) and data from the
literature. We used a semi-automatic process to standardize the nomenclature of metabolites in the
71 BOFs, as there is still not a norm for the terminology of metabolites in GS- CBM. We found that
the clustering of these 71 models based on their BOFs fails to represent the phylogenetic
relationship of the modelled prokaryotes. No cofactor was present in all the BOFs analysed...
Fonte: Universidade do MinhoPublicador: Universidade do Minho
Tipo: Conferência ou Objeto de Conferência
Publicado em /06/2015ENG
Relevância na Pesquisa
Knowledge of the core biochemical composition of the cell is critical for genome-scale metabolic modelling. In order to identify the universal core organic cofactors for prokaryotes, we performed a detailed analysis of biomass objective functions (BOFs) of 71 manually curated genome-scale prokaryotic models. These were then compared and integrated with the ModelSEED framework for biomass composition, experimental data on gene essentiality, curated enzyme-cofactor association data and a comprehensive survey of the literature. Surprisingly, no cofactor was present in all the BOFs analysed, including the important redox cofactor nicotinamide adenine dinucleotide (NAD) or its derivatives. Our results indicate not only the redox cofactors but also others such as coenzyme A, flavins and thiamin as universally essential for prokaryotes and therefore as important to include in the BOFs of future genome-scale models of prokaryotic organisms.
The ever growing number of completely sequenced prokaryotic genomes facilitates cross-species comparisons by genomic annotation algorithms. This paper introduces a new probabilistic framework for comparative genomic analysis and demonstrates its utility in the context of improving the accuracy of prokaryotic gene start site detection. Our frame work employs a product hidden Markov model (PROD-HMM) with state architecture to model the species-specific trinucleotide frequency patterns in sequences immediately upstream and downstream of a translation start site and to detect the contrasting non-synonymous (amino acid changing) and synonymous (silent) substitution rates that differentiate prokaryotic coding from intergenic regions. Depending on the intricacy of the features modeled by the hidden state architecture, intergenic, regulatory, promoter and coding regions can be delimited by this method. The new system is evaluated using a preliminary set of orthologous Pyrococcus gene pairs, for which it demonstrates an improved accuracy of detection. Its robustness is confirmed by analysis with cross-validation of an experimentally verified set of Escherichia coli K-12 and Salmonella thyphimurium LT2 orthologs. The novel architecture has a number of attractive features that distinguish it from previous comparative models such as pair-HMMs.
Vaughn, J C; Sperbeck, S J; Ramsey, W J; Lawrence, C B
Fonte: PubMedPublicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em 11/10/1984EN
Relevância na Pesquisa
The phylogenetic approach (ref. 1) has been utilized in construction of a universal 5.8S rRNA secondary structure model, in which about 65% of the residues exist in paired structures. Conserved nucleotides primarily occupy unpaired regions. Multiple compensating base changes are demonstrated to be present in each of the five postulated helices, thereby forming a major basis for their proof. The results of chemical and enzymatic probing of 5.8S rRNAs (ref. 13, 32) are fully consistent with, and support, our model. This model differs in several ways from recently proposed 5.8S rRNA models (ref. 3, 4), which are discussed. Each of the helices in our model has been extended to the corresponding bacterial, chloroplast and mitochondrial sequences, which are demonstrated to be positionally conserved by alignment with their eukaryotic counterparts. This extension is also made for the base paired 5.8S/28S contact points, and their prokaryotic and organelle counterparts. The demonstrated identity of secondary structure in these diverse molecules strongly suggests that they perform equivalent functions in prokaryotic and eukaryotic ribosomes.
The determination of the phylogenetic relationships among microorganisms has long relied primarily on gene sequence information. Given that prokaryotic organisms often lack morphological characteristics amenable to phylogenetic analysis, prokaryotic phylogenies, in particular, are often based on sequence data. In this work, we explore a new source of phylogenetic information, the distribution of protein structural domains within fully sequenced prokaryotic genomes. The evolution of the structural domains we use has been studied extensively, allowing us to base our phylogenetic methods on testable theoretical models of structural evolution. We find that the methods that produce reasonable phylogenetic relationships are indeed the methods that are most consistent with theoretical evolutionary models. This work represents, to our knowledge, the first such theoretically motivated phylogeny, as well as the first application of structural information to phylogeny on this scale. Our results have strong implications for the phylogenetic relationships among prokaryotic organisms and for the understanding of protein evolution as a whole.
Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21–37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer “immunity” against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units...
We have performed a computational simulation of the aggregation and chaperonin-dependent reconstitution of dimeric prokaryotic ribulose bisphosphate carboxylase/oxygenase (Rubisco), based on the data of P. Goloubinoff et al. (1989, Nature 342, 884-889) and P. V. Viitanen et al. (1990, Biochemistry 29, 5665-5671). The aggregation is simulated by a set of 12 differential equations representing the aggregation of the Rubisco folding intermediate, Rubisco-I, with itself and with aggregates of Rubisco-I, leading up to dodecamers. Four rate constants, applying to forward or reverse steps in the aggregation process, were included. Optimal values for these constants were determined using the ellipsoid algorithm as implemented by one of us (Ecker, J.G. & Kupferschmid, M., 1988, Introduction to Operations Research, Wiley, New York, pp. 315-322). Intensive exploration of simpler aggregation models did not identify an alternative that could simulate the data as well as this one. The activity of the chaperonin in this system was simulated by using this aggregation model, combined with a model similar to that proposed by Goloubinoff et al. (1989). The model assumes that the chaperonin can bind the folding intermediate rapidly, and that the chaperonin complex releases the Rubisco molecule slowly...
Recent advances in DNA sequencers are accelerating genome sequencing, especially in microbes, and complete and draft genomes from various species have been sequenced in rapid succession. Here, we present a comprehensive gene prediction tool, the MetaGeneAnnotator (MGA), which precisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths. The MGA integrates statistical models of prophage genes, in addition to those of bacterial and archaeal genes, and also uses a self-training model from input sequences for predictions. As a result, the MGA sensitively detects not only typical genes but also atypical genes, such as horizontally transferred and prophage genes in a prokaryotic genome. In this paper, we also propose a novel approach for analyzing the ribosomal binding site (RBS), which enables us to detect species-specific patterns of the RBSs. The MGA has the ingenious RBS model based on this approach, and precisely predicts translation starts of genes. The MGA also succeeds in improving prediction accuracies for short sequences by using the adapted RBS models (96% sensitivity and 93% specificity for 700 bp fragments). These features of the MGA expedite wide ranges of microbial genome studies...
The evolutionary history of biological pathways is of general interest, especially in this post-genomic era, because it may provide clues for understanding how complex systems encoded on genomes have been organized. To explain how pathways can evolve de novo, some noteworthy models have been proposed. However, direct reconstruction of pathway evolutionary history both on a genomic scale and at the depth of the tree of life has suffered from artificial effects in estimating the gene content of ancestral species. Recently, we developed an algorithm that effectively reconstructs gene-content evolution without these artificial effects, and we applied it to this problem. The carefully reconstructed history, which was based on the metabolic pathways of 160 prokaryotic species, confirmed that pathways have grown beyond the random acquisition of individual genes. Pathway acquisition took place quickly, probably eliminating the difficulty in holding genes during the course of the pathway evolution. This rapid evolution was due to massive horizontal gene transfers as gene groups, some of which were possibly operon transfers, which would convey existing pathways but not be able to generate novel pathways. To this end, we analyzed how these pathways originally appeared and found that the original acquisition of pathways occurred more contemporaneously than expected across different phylogenetic clades. As a possible model to explain this observation...
Prediction of transcription factor binding sites is an important challenge in genome analysis. The advent of next generation genome sequencing technologies makes the development of effective computational approaches particularly imperative. We have developed a novel training-based methodology intended for prokaryotic transcription factor binding site prediction. Our methodology extends existing models by taking into account base interdependencies between neighbouring positions using conditional probabilities and includes genomic background weighting. This has been tested against other existing and novel methodologies including position-specific weight matrices, first-order Hidden Markov Models and joint probability models. We have also tested the use of gapped and ungapped alignments and the inclusion or exclusion of background weighting. We show that our best method enhances binding site prediction for all of the 22 Escherichia coli transcription factors with at least 20 known binding sites, with many showing substantial improvements. We highlight the advantage of using block alignments of binding sites over gapped alignments to capture neighbouring position interdependencies. We also show that combining these methods with ChIP-on-chip data has the potential to further improve binding site prediction. Finally we have developed the ungapped likelihood under positional background platform: a user friendly website that gives access to the prediction method devised in this work.
The UvrB protein is a central unit for damage recognition in the prokaryotic nucleotide excision repair system, which excises bulky DNA lesions. We have utilized molecular modeling and MD simulations based on crystal structures, mutagenesis, and fluorescence data, to model the 10R-(+)-cis-anti-B[a]P-N2-dG lesion, derived from the tumorigenic (+) anti-B[a]PDE metabolite of benzo[a]pyrene, at different locations on the inner and outer strand in UvrB. Our results suggest that this lesion is accommodated on the inner strand where it might translocate through the tunnel created by the β-hairpin and the UvrB domain 1B, and ultimately could be housed in the pocket behind the β-hairpin prior to excision by UvrC. Lesions that vary in size and shape may be stopped at the gate to the tunnel, within the tunnel or in the pocket when UvrC initiates excision. Common features of β-hairpin intrusion between the two DNA strands and nucleotide flipping manifested in structures of prokaryotic and eukaryotic NER lesion recognition proteins are consistent with common recognition mechanisms, based on lesion-induced local thermodynamic distortion/destabilization and nucleotide flipping.
DNA sequences that arrest transcription by either eukaryotic RNA polymerase II or Escherichia coli RNA polymerase have been identified previously. Elongation factors SII and GreB are RNA polymerase-binding proteins that enable readthrough of arrest sites by these enzymes, respectively. This functional similarity has led to general models of elongation applicable to both eukaryotic and prokaryotic enzymes. Here we have transcribed with phage and bacterial RNA polymerases, a human DNA sequence previously defined as an arrest site for RNA polymerase II. The phage and bacterial enzymes both respond efficiently to the arrest signal in vitro at limiting levels of nucleoside triphosphates. The E. coli polymerase remains in a template-engaged complex for many hours, can be isolated, and is potentially active. The enzyme displays a relatively slow first-order loss of elongation competence as it dwells at the arrest site. Bacterial RNA polymerase arrested at the human site is reactivated by GreB in the same way that RNA polymerase II arrested at this site is stimulated by SII. Very efficient readthrough can be achieved by phage, bacterial, and eukaryotic RNA polymerases in the absence of elongation factors if 5-Br-UTP is substituted for UTP. These findings provide additional and direct evidence for functional similarity between prokaryotic and eukaryotic transcription elongation and readthrough mechanisms.
Glycosylation is one of the most abundant post-translational modifications (PTMs) required for various structure/function modulations of proteins in a living cell. Although elucidated recently in prokaryotes, this type of PTM is present across all three domains of life. In prokaryotes, two types of protein glycan linkages are more widespread namely, N- linked, where a glycan moiety is attached to the amide group of Asn, and O- linked, where a glycan moiety is attached to the hydroxyl group of Ser/Thr/Tyr. For their biologically ubiquitous nature, significance, and technology applications, the study of prokaryotic glycoproteins is a fast emerging area of research. Here we describe new Support Vector Machine (SVM) based algorithms (models) developed for predicting glycosylated-residues (glycosites) with high accuracy in prokaryotic protein sequences. The models are based on binary profile of patterns, composition profile of patterns, and position-specific scoring matrix profile of patterns as training features. The study employ an extensive dataset of 107 N-linked and 116 O-linked glycosites extracted from 59 experimentally characterized glycoproteins of prokaryotes. This dataset includes validated N-glycosites from phyla Crenarchaeota...
The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services...
One of the challenges in oceanography is to understand the influence of environmental factors on the abundances of prokaryotes and viruses. Generally, conventional statistical methods resolve trends well, but more complex relationships are difficult to explore. In such cases, Artificial Neural Networks (ANNs) offer an alternative way for data analysis. Here, we developed ANN-based models of prokaryotic and viral abundances in the Arctic Ocean. The models were used to identify the best predictors for prokaryotic and viral abundances including cytometrically-distinguishable populations of prokaryotes (high and low nucleic acid cells) and viruses (high- and low-fluorescent viruses) among salinity, temperature, depth, day length, and the concentration of Chlorophyll-a. The best performing ANNs to model the abundances of high and low nucleic acid cells used temperature and Chl-a as input parameters, while the abundances of high- and low-fluorescent viruses used depth, Chl-a, and day length as input parameters. Decreasing viral abundance with increasing depth and decreasing system productivity was captured well by the ANNs. Despite identifying the same predictors for the two populations of prokaryotes and viruses, respectively, the structure of the best performing ANNs differed between high and low nucleic acid cells and between high- and low-fluorescent viruses. Also...
Electrochemical signaling in the brain depends on pentameric ligand-gated ion channels (pLGICs). Recently, crystal structures of prokaryotic pLGIC homologues from Erwinia chrysanthemi (ELIC) and Gloeobacter violaceus (GLIC) in presumed closed and open channel states have been solved, which provide insight into the structural mechanisms underlying channel activation. Although structural studies involving both ELIC and GLIC have become numerous, thorough functional characterizations of these channels are still needed to establish a reliable foundation for comparing kinetic properties. Here, we examined the kinetics of ELIC and GLIC current activation, desensitization, and deactivation and compared them to the GABAA receptor, a prototypic eukaryotic pLGIC. Outside-out patch-clamp recordings were performed with HEK-293T cells expressing ELIC, GLIC, or α1β2γ2L GABAA receptors, and ultra-fast ligand application was used. In response to saturating agonist concentrations, we found both ELIC and GLIC current activation were two to three orders of magnitude slower than GABAA receptor current activation. The prokaryotic channels also had slower current desensitization on a timescale of seconds. ELIC and GLIC current deactivation following 25 s pulses of agonist (cysteamine and pH 4.0 buffer...
This experimental microcosm study reports the influence of organic enrichments by mussel biodeposits on the metabolic activity and functional diversity of benthic prokaryotic communities. The different biodeposit enrichment regimes created, which mimicked the quantity of faeces and pseudo-faeces potentially deposited below mussel farms, show a clear stimulatory effect of this organic enrichment on prokaryotic metabolic activity. This effect was detected once a certain level of biodeposition was attained with a tipping point estimated between 3.25 and 10 g day-1 m-2. Prokaryotic communities recovered their initial metabolic activity by 11 days after the cessation of biodeposit additions. However, their functional diversity remained greater than prior to the disturbance suggesting that mussel biodeposit enrichment may disturb the functioning and perhaps the role of prokaryotic communities in benthic ecosystems. This manipulative approach provided new information on the influence of mussel biodeposition on benthic prokaryotic communities and dose-response relationships and may support the development of carrying capacity models for bivalve culture.
Background: Bacteria have evolved the ability to efficiently and resourcefully adapt to changing environments. A key means by which they optimize their use of available nutrients is through adjustments in gene expression with consequent changes in enzyme activity. We report a new method for drawing environmental inferences from gene expression data. Our method prioritizes a list of candidate carbon sources for their compatibility with a gene expression profile using the framework of flux balance analysis to model the organism’s metabolic network. Principal Findings: For each of six gene expression profiles for Escherichia coli grown under differing nutrient conditions, we applied our method to prioritize a set of eighteen different candidate carbon sources. Our method ranked the correct carbon source as one of the top three candidates for five of the six expression sets when used with a genome-scale model. The correct candidate ranked fifth in the remaining case. Additional analyses show that these rankings are robust with respect to biological and measurement variation, and depend on specific gene expression, rather than general expression level. The gene expression profiles are highly adaptive: simulated production of biomass averaged 94.84% of maximum when the in silico carbon source matched the in vitro source of the expression profile...
Pseudomonas aeruginosa strain PA14 is an opportunistic human pathogen capable of infecting a wide range of organisms including the nematode Caenorhabditis elegans. We used a non-redundant transposon mutant library consisting of 5,850 clones corresponding to 75% of the total and approximately 80% of the non-essential PA14 ORFs to carry out a genome-wide screen for attenuation of PA14 virulence in C. elegans. We defined a functionally diverse 180 mutant set (representing 170 unique genes) necessary for normal levels of virulence that included both known and novel virulence factors. Seven previously uncharacterized virulence genes (ABC transporters PchH and PchI, aminopeptidase PepP, ATPase/molecular chaperone ClpA, cold shock domain protein PA0456, putative enoyl-CoA hydratase/isomerase PA0745, and putative transcriptional regulator PA14_27700) were characterized with respect to pigment production and motility and all but one of these mutants exhibited pleiotropic defects in addition to their avirulent phenotype. We examined the collection of genes required for normal levels of PA14 virulence with respect to occurrence in P. aeruginosa strain-specific genomic regions, location on putative and known genomic islands, and phylogenetic distribution across prokaryotes. Genes predominantly contributing to virulence in C. elegans showed neither a bias for strain-specific regions of the P. aeruginosa genome nor for putatively horizontally transferred genomic islands. Instead...
Computer methods of accurate gene finding in DNA sequences require models of protein coding and non-coding regions derived either from experimentally validated training sets or from large amounts of anonymous DNA sequence. Here we propose a new, heuristic method producing fairly accurate inhomogeneous Markov models of protein coding regions. The new method needs such a small amount of DNA sequence data that the model can be built 'on the fly' by a web server for any DNA sequence >400 nt. Tests on 10 complete bacterial genomes performed with the GeneMark.hmm program demonstrated the ability of the new models to detect 93.1% of annotated genes on average, while models built by traditional training predict an average of 93.9% of genes. Models built by the heuristic approach could be used to find genes in small fragments of anonymous prokaryotic genomes and in genomes of organelles, viruses, phages and plasmids, as well as in highly inhomogeneous genomes where adjustment of models to local DNA composition is needed. The heuristic method also gives an insight into the mechanism of codon usage pattern evolution.