A Gene Ontology Based Computational Approach for the Prediction of Protein Functions

A Gene Ontology Based Computational Approach for the Prediction of Protein Functions
Author: Saket Kharsikar
Publisher:
Total Pages: 92
Release: 2007
Genre: Biomedical engineering
ISBN:

Download A Gene Ontology Based Computational Approach for the Prediction of Protein Functions Book in PDF, Epub and Kindle

Numerous genome projects have produced a large and ever increasing amount of genomic sequence data. However, the biological functions of many proteins encoded by the sequences remain unknown. Protein function annotation and prediction become an essential and challenging task of post-genomic research. In this research, we present an automated protein function prediction system based on a set of proteins of known biological functions. The functions of the proteins are characterized with Gene Ontology (GO) annotations. The prediction system uses a novel measure to calculate the pair-wise overall similarity between protein sequences. The protein function prediction is performed based on the GO annotations of similar sequences using a weighted k-nearest neighbor method. We show the prediction accuracies obtained using the model organism yeast (Sacchyromyces cerevisiae). The results indicate that the weighted k-nearest neighbor method significantly outperforms the regular k-nearest neighbor method for protein biological function prediction.

New Approaches of Protein Function Prediction from Protein Interaction Networks

New Approaches of Protein Function Prediction from Protein Interaction Networks
Author: Jingyu Hou
Publisher: Academic Press
Total Pages: 126
Release: 2017-01-13
Genre: Mathematics
ISBN: 0128099445

Download New Approaches of Protein Function Prediction from Protein Interaction Networks Book in PDF, Epub and Kindle

New Approaches of Protein Function Prediction from Protein Interaction Networks contains the critical aspects of PPI network based protein function prediction, including semantically assessing the reliability of PPI data, measuring the functional similarity between proteins, dynamically selecting prediction domains, predicting functions, and establishing corresponding prediction frameworks. Functional annotation of proteins is vital to biological and clinical research and other applications due to the important roles proteins play in various biological processes. Although the functions of some proteins have been annotated via biological experiments, there are still many proteins whose functions are yet to be annotated due to the limitations of existing methods and the high cost of experiments. To overcome experimental limitations, this book helps users understand the computational approaches that have been rapidly developed for protein function prediction. Provides innovative approaches and new developments targeting key issues in protein function prediction Presents heuristic ideas for further research in this challenging area

Computational Approaches for Protein Functions and Gene Association Networks

Computational Approaches for Protein Functions and Gene Association Networks
Author: Hari Krishna Yalamanchili
Publisher: Open Dissertation Press
Total Pages:
Release: 2017-01-27
Genre:
ISBN: 9781361349748

Download Computational Approaches for Protein Functions and Gene Association Networks Book in PDF, Epub and Kindle

This dissertation, "Computational Approaches for Protein Functions and Gene Association Networks" by Hari Krishna, Yalamanchili, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Entire molecular biology revolves primarily around proteins and genes (DNA and RNA). They collaborate with each other facilitating various biomolecular systems. Thus, to comprehend any biological phenomenon from very basic cell division to most complex cancer, it is fundamental to decode the functional dynamics of proteins and genes. Recently, computational approaches are being widely used to supplement traditional experimental approaches. However, each automated approach has its own advantages and limitations. In this thesis, major shortcomings of existing computational approaches are identified and alternative fast yet precise methods are proposed. First, a strong need for reliable automated protein function prediction is identified. Almost half of protein functional interpretations are enigmatic. Lack of universal functional vocabulary further elevates the problem. NRProF, a novel neural response based method is proposed for protein functional annotation. Neural response algorithm simulates human brain in classifying images; the same is applied here for classifying proteins. Considering Gene Ontology (GO) hierarchical structure as background, NRProF classifies a protein of interest to a specific GO category and thus assigns the corresponding function. Having established reliable protein functional annotations, protein and gene collaborations are studied next. Interactions amongst transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental for gene regulation and are highly specific, even in evolution background. To explain this binding specificity a Co-Evo (co-evolutionary) relationship is hypothesized. Pearson correlation and Mutual Information (MI) metrics are used to validate the hypothesis. Residue level MI is used to infer specific binding residues of TFs and corresponding TFBSs, assisting a thorough understanding of gene regulatory mechanism and aid targeted gene therapies. After comprehending TF and TFBS associations, interplay between genes is abstracted as Gene Regulatory Networks. Several methods using expression correlations are proposed to infer gene networks. However, most of them ignore the embedded dynamic delay induced by complex molecular interactions and other riotous cellular mechanisms, involved in gene regulation. The delay is rather obvious in high frequency time series expression data. DDGni, a novel network inference strategy is proposed by adopting gapped smith-waterman algorithm. Gaps attune expression delays and local alignment unveils short regulatory windows, which traditional methods overlook. In addition to gene level expression data, recent studies demonstrated the merits of exon-level RNA-Seq data in profiling splice variants and constructing gene networks. However, the large number of exons versus small sample size limits their practical application. SpliceNet, a novel method based on Large Dimensional Trace is proposed to infer isoform specific co-expression networks from exon-level RNA-Seq data. It provides a more comprehensive picture to our understanding of complex diseases by inferring network rewiring between normal and diseased samples at isoform resolution. It can be applied to any exon level RNA-Seq data and exon array data. In summary, this thesis first identifies major shortcomings of existing computational approaches to functional association of proteins and genes, and develops seve

Information-Theoretic Evaluation for Computational Biomedical Ontologies

Information-Theoretic Evaluation for Computational Biomedical Ontologies
Author: Wyatt Travis Clark
Publisher: Springer Science & Business Media
Total Pages: 50
Release: 2014-01-09
Genre: Computers
ISBN: 331904138X

Download Information-Theoretic Evaluation for Computational Biomedical Ontologies Book in PDF, Epub and Kindle

The development of effective methods for the prediction of ontological annotations is an important goal in computational biology, yet evaluating their performance is difficult due to problems caused by the structure of biomedical ontologies and incomplete annotations of genes. This work proposes an information-theoretic framework to evaluate the performance of computational protein function prediction. A Bayesian network is used, structured according to the underlying ontology, to model the prior probability of a protein's function. The concepts of misinformation and remaining uncertainty are then defined, that can be seen as analogs of precision and recall. Finally, semantic distance is proposed as a single statistic for ranking classification models. The approach is evaluated by analyzing three protein function predictors of gene ontology terms. The work addresses several weaknesses of current metrics, and provides valuable insights into the performance of protein function prediction tools.

Analysis of Gene Expression Data for Gene Ontology Based Protein Function Prediction

Analysis of Gene Expression Data for Gene Ontology Based Protein Function Prediction
Author: Robert Daniel Macholan
Publisher:
Total Pages: 103
Release: 2011
Genre: Computational biology
ISBN:

Download Analysis of Gene Expression Data for Gene Ontology Based Protein Function Prediction Book in PDF, Epub and Kindle

A tremendous increase in genomic data has encouraged biologists to turn to bioinformatics in order to assist in its interpretation and processing. One of the present challenges that need to be overcome in order to understand this data more completely is the development of a reliable method to accurately predict the function of a protein from its genomic information. This study focuses on developing an effective algorithm for protein function prediction. The algorithm is based on proteins that have similar expression patterns. The similarity of the expression data is determined using a novel measure, the slope matrix. The slope matrix introduces a normalized method for the comparison of expression levels throughout a proteome. The algorithm is tested using real microarray gene expression data. Their functions are characterized using gene ontology annotations. The results of the case study indicate the protein function prediction algorithm developed is comparable to the prediction algorithms that are based on the annotations of homologous proteins.

Protein Function Prediction for Omics Era

Protein Function Prediction for Omics Era
Author: Daisuke Kihara
Publisher: Springer Science & Business Media
Total Pages: 316
Release: 2011-04-19
Genre: Medical
ISBN: 9400708815

Download Protein Function Prediction for Omics Era Book in PDF, Epub and Kindle

Gene function annotation has been a central question in molecular biology. The importance of computational function prediction is increasing because more and more large scale biological data, including genome sequences, protein structures, protein-protein interaction data, microarray expression data, and mass spectrometry data, are awaiting biological interpretation. Traditionally when a genome is sequenced, function annotation of genes is done by homology search methods, such as BLAST or FASTA. However, since these methods are developed before the genomics era, conventional use of them is not necessarily most suitable for analyzing a large scale data. Therefore we observe emerging development of computational gene function prediction methods, which are targeted to analyze large scale data, and also those which use such omics data as additional source of function prediction. In this book, we overview this emerging exciting field. The authors have been selected from 1) those who develop novel purely computational methods 2) those who develop function prediction methods which use omics data 3) those who maintain and update data base of function annotation of particular model organisms (E. coli), which are frequently referred

The Gene Ontology Handbook

The Gene Ontology Handbook
Author: Christophe Dessimoz
Publisher:
Total Pages: 298
Release: 2020-10-08
Genre: Science
ISBN: 9781013267710

Download The Gene Ontology Handbook Book in PDF, Epub and Kindle

This book provides a practical and self-contained overview of the Gene Ontology (GO), the leading project to organize biological knowledge on genes and their products across genomic resources. Written for biologists and bioinformaticians, it covers the state-of-the-art of how GO annotations are made, how they are evaluated, and what sort of analyses can and cannot be done with the GO. In the spirit of the Methods in Molecular Biology book series, there is an emphasis throughout the chapters on providing practical guidance and troubleshooting advice. Authoritative and accessible, The Gene Ontology Handbook serves non-experts as well as seasoned GO users as a thorough guide to this powerful knowledge system. This work was published by Saint Philip Street Press pursuant to a Creative Commons license permitting commercial use. All rights not granted by the work's license are retained by the author or authors.

Gene Prediction: Applying Ontology and Machine Learning (Volume II)

Gene Prediction: Applying Ontology and Machine Learning (Volume II)
Author: Casper Harvey
Publisher: Larsen and Keller Education
Total Pages: 0
Release: 2023-09-26
Genre: Science
ISBN:

Download Gene Prediction: Applying Ontology and Machine Learning (Volume II) Book in PDF, Epub and Kindle

Gene prediction refers to the process of identifying the regions of genomic DNA that encodes genes using computational methods. It is an important part of bioinformatics. Gene prediction is the first step for annotating large and contiguous sequences. It aids in identifying the essential elements of the genome including functional genes, intron, splicing sites, exon, and regulatory sites. It is also used in describing the individual genes based on their functions. Protein function prediction is an important part of genome annotation. Lately, high-throughput sequencing technologies have led to development of prediction methods. Gene ontology (GO) is one of the databases that are available for identifying the functional properties of proteins. Research in this domain is now focused on efficiently predicting the GO terms. Researches are ongoing on the use of machine learning algorithms for functional prediction as these algorithms use rule-based approaches to integrate large amounts of heterogeneous data and detect patterns. mSplicer, mGene, and CONTRAST are methods that use machine learning techniques for gene prediction. Gene prediction methods are widely used in fields like structural genomics, functional genomics, and genome studies. This book traces the progress of gene prediction and the application of ontology and machine learning. It is appropriate for students seeking detailed information in this area of study as well as for experts.

Big Data Analytics in Genomics

Big Data Analytics in Genomics
Author: Ka-Chun Wong
Publisher: Springer
Total Pages: 426
Release: 2016-10-24
Genre: Computers
ISBN: 3319412795

Download Big Data Analytics in Genomics Book in PDF, Epub and Kindle

This contributed volume explores the emerging intersection between big data analytics and genomics. Recent sequencing technologies have enabled high-throughput sequencing data generation for genomics resulting in several international projects which have led to massive genomic data accumulation at an unprecedented pace. To reveal novel genomic insights from this data within a reasonable time frame, traditional data analysis methods may not be sufficient or scalable, forcing the need for big data analytics to be developed for genomics. The computational methods addressed in the book are intended to tackle crucial biological questions using big data, and are appropriate for either newcomers or veterans in the field.This volume offers thirteen peer-reviewed contributions, written by international leading experts from different regions, representing Argentina, Brazil, China, France, Germany, Hong Kong, India, Japan, Spain, and the USA. In particular, the book surveys three main areas: statistical analytics, computational analytics, and cancer genome analytics. Sample topics covered include: statistical methods for integrative analysis of genomic data, computation methods for protein function prediction, and perspectives on machine learning techniques in big data mining of cancer. Self-contained and suitable for graduate students, this book is also designed for bioinformaticians, computational biologists, and researchers in communities ranging from genomics, big data, molecular genetics, data mining, biostatistics, biomedical science, cancer research, medical research, and biology to machine learning and computer science. Readers will find this volume to be an essential read for appreciating the role of big data in genomics, making this an invaluable resource for stimulating further research on the topic.

Network-based Information Integration for Protein Function Prediction

Network-based Information Integration for Protein Function Prediction
Author: Xiaoyu Jiang
Publisher:
Total Pages: 182
Release: 2009
Genre:
ISBN:

Download Network-based Information Integration for Protein Function Prediction Book in PDF, Epub and Kindle

Abstract: Protein function prediction is a fundamental problem in computational biology. For protein activities described by terms in databases such as the Gene Ontology (GO), this task is typically pursued as a binary classification problem. As a result of an astonishing increase in the available genome-wide protein information, integrating different protein datasets has become a significant opportunity and a major focus to infer functionality. This dissertation contains three novel approaches to integrate popular protein information to classify proteins into functional categories. A probabilistic method, Hierarchical Binomial-Neighborhood (HBN), combining proteins' relational information from the protein-protein interaction (PPI) network, together with the GO hierarchical structure, is proposed first. Results from comparing analogous models on terms from the biological process ontology and genes from the yeast genome show substantial improvement and further analysis illustrates that such an improvement is uniformly consistent with the GO depth. Being aware of the fact that the gene interaction knowledge is still incomplete in most organisms, the second approach we develop is an aggressively integrative probabilistic framework, Probabilistic Hierarchical Inferences for Protein Activity (PHIPA), with improved data usage efficiency, for combining protein relational network, categorical motif and cellular localization information and the GO hierarchy. We implement it on a network extracted from an integrative protein-protein association databases STRING (Search Tool for the Retrieval of Interacting Genes/Proteins). Being based on Nearest-Neighbor, or the "guilt-by-association" counting principle, both HBN and PHIPA use only the local neighborhood information, and are therefore built on local probabilistic models. In contrast, we develop a third approach, a fully Bayesian network-based auto-probit framework encoding the functional similarity influenced by the network topology. We not only show that the auto-probit model works equally well in prediction as the "local" methods, but also demonstrate its capability of producing more potentially interesting protein predictions by taking advantage of GO annotation uncertainty, which is critical in using and improving the GO database but yet has been ignored by most existing methodologies in this context.