Variable selection for large p small n regression models with incomplete data: mapping QTL with epistases. Academic Article uri icon

Overview

abstract

  • BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.

publication date

  • May 29, 2008

Research

keywords

  • Chromosome Mapping
  • Epistasis, Genetic
  • Models, Genetic
  • Quantitative Trait Loci
  • Regression Analysis

Identity

PubMed Central ID

  • PMC2435550

Scopus Document Identifier

  • 46049085599

Digital Object Identifier (DOI)

  • 10.1186/1471-2105-9-251

PubMed ID

  • 18510743

Additional Document Info

volume

  • 9