“School of Biological”

Back to Papers Home
Back to Papers of School of Biological

Paper   IPM / Biological / 14424
School of Biological Sciences
  Title:   Missing Value Imputation for RNA-Sequencing Data Using Statistical Models: A Comparative Study
  Author(s): 
1.  Taban Baghfalaki
2.  Mojtaba Ganjali
3.  Damon Berridge
  Status:   inProgress
  Journal: J. Appl. Statist.
  Year:  2016
  Supported by:  IPM
  Abstract:
RNA-seq technology has been widely used as an alternative approach to traditional microarrays in transcript analysis. Sometimes gene expression by sequencing, which generates RNA-seq data set, may have missing read counts. These missing values can adversely affect downstream analyses. Most of the methods for analysing the RNA-seq data sets require a complete matrix of RNA-seq data. In the past few years, researchers have been putting a great deal of effort into presenting evaluations of the different imputation algorithms in microarray gene expression data sets, However, these are limited works for RNA-seq data sets and a comparative study for investigating the performance of the missing value imputation for RNA-seq data is essential. In this paper, we propose the use of some parametric models such as Regression imputation, Bayesian generalized linear model, Poisson mixture model, EM approach , Bayesian Poisson regression, Bayesian quasi-Poisson regression and the Bootstrap version of two latter for single imputation of missing values in RNA-seq count data sets. The approaches are also applied for identifying differentially expressed genes in the presence of missing values. Multiple imputation, proposed by Rubin (1978), is also used for multiple imputation of missing RNA-seq counts. This approach allows appropriate assessment of imputation uncertainty for missing values. The performance of the single and multiple imputations are investigated using some simulation studies. Also, some real data sets are analyzed using the proposed approaches.

Download TeX format
back to top
scroll left or right