Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data

Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data
Author :
Publisher :
Total Pages :
Release :
ISBN-13 : OCLC:813855811
ISBN-10 :
Rating : 4/5 ( Downloads)

Book Synopsis Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data by : Xuan Wang

Download or read book Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data written by Xuan Wang and published by . This book was released on 2012 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Proteomics serves an important role at the systems-level in understanding of biological functioning. Mass spectrometry proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. In the most widely used bottom-up approach to MS-based high-throughput quantitative proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and then analyzed using a mass spectrometer. The three fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (i) Identifying the proteins that are present in a sample, (ii) Aligning different samples on elution (retention) time, mass, peak area (intensity) and etc, (iii) Quantifying the abundance levels of the identified proteins after alignment. Each of these challenges requires knowledge of the biological and technological context that give rise to the observed data, as well as the application of sound statistical principles for estimation and inference. In this dissertation, we present a set of statistical methods in bottom-up proteomics towards protein identification, alignment and quantification. We describe a fully Bayesian hierarchical modeling approach to peptide and protein identification on the basis of MS/MS fragmentation patterns in a unified framework. Our major contribution is to allow for dependence among the list of top candidate PSMs, which we accomplish with a Bayesian multiple component mixture model incorporating decoy search results and joint estimation of the accuracy of a list of peptide identifications for each MS/MS fragmentation spectrum. We also propose an objective criteria for the evaluation of the False Discovery Rate (FDR) associated with a list of identifications at both peptide level, which results in more accurate FDR estimates than existing methods like PeptideProphet. Several alignment algorithms have been developed using different warping functions. However, all the existing alignment approaches suffer from a useful metric for scoring an alignment between two data sets and hence lack a quantitative score for how good an alignment is. Our alignment approach uses "Anchor points" found to align all the individual scan in the target sample and provides a framework to quantify the alignment, that is, assigning a p-value to a set of aligned LC-MS runs to assess the correctness of alignment. After alignment using our algorithm, the p-values from Wilcoxon signed-rank test on elution (retention) time, M/Z, peak area successfully turn into non-significant values. Quantitative mass spectrometry-based proteomics involves statistical inference on protein abundance, based on the intensities of each protein's associated spectral peaks. However, typical mass spectrometry-based proteomics data sets have substantial proportions of missing observations, due at least in part to censoring of low intensities. This complicates intensity-based differential expression analysis. We outline a statistical method for protein differential expression, based on a simple Binomial likelihood. By modeling peak intensities as binary, in terms of "presence / absence", we enable the selection of proteins not typically amendable to quantitative analysis; e.g., "one-state" proteins that are present in one condition but absent in another. In addition, we present an analysis protocol that combines quantitative and presence / absence analysis of a given data set in a principled way, resulting in a single list of selected proteins with a single associated FDR.


Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data Related Books

Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data
Language: en
Pages:
Authors: Xuan Wang
Categories:
Type: BOOK - Published: 2012 - Publisher:

DOWNLOAD EBOOK

Proteomics serves an important role at the systems-level in understanding of biological functioning. Mass spectrometry proteomics has become the tool of choice
Computational and Statistical Methods for Protein Quantification by Mass Spectrometry
Language: en
Pages: 290
Authors: Ingvar Eidhammer
Categories: Mathematics
Type: BOOK - Published: 2012-12-10 - Publisher: John Wiley & Sons

DOWNLOAD EBOOK

The definitive introduction to data analysis in quantitative proteomics This book provides all the necessary knowledge about mass spectrometry based proteomics
Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry
Language: en
Pages: 294
Authors: Susmita Datta
Categories: Medical
Type: BOOK - Published: 2016-12-15 - Publisher: Springer

DOWNLOAD EBOOK

This book presents an overview of computational and statistical design and analysis of mass spectrometry-based proteomics, metabolomics, and lipidomics data. Th
Proteomics Data Analysis
Language: en
Pages: 326
Authors: Daniela Cecconi
Categories: Proteomics
Type: BOOK - Published: 2021 - Publisher:

DOWNLOAD EBOOK

This thorough book collects methods and strategies to analyze proteomics data. It is intended to describe how data obtained by gel-based or gel-free proteomics
Mass Spectrometry Data Analysis in Proteomics
Language: en
Pages: 322
Authors: Rune Matthiesen
Categories: Science
Type: BOOK - Published: 2008-02-02 - Publisher: Springer Science & Business Media

DOWNLOAD EBOOK

This is an in-depth guide to the theory and practice of analyzing raw mass spectrometry (MS) data in proteomics. The volume outlines available bioinformatics pr