Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Academic Article uri icon

Overview

MeSH

  • Alleles
  • Gene Frequency
  • Haplotypes
  • Humans
  • Polymorphism, Single Nucleotide

MeSH Major

  • Algorithms
  • Genome, Human
  • Genome-Wide Association Study
  • Microarray Analysis

abstract

  • A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.

authors

publication date

  • June 13, 2014

has subject area

  • Algorithms
  • Alleles
  • Gene Frequency
  • Genome, Human
  • Genome-Wide Association Study
  • Haplotypes
  • Humans
  • Microarray Analysis
  • Polymorphism, Single Nucleotide

Research

keywords

  • Journal Article

Identity

Language

  • eng

PubMed Central ID

  • PMC4338501

Digital Object Identifier (DOI)

  • 10.1038/ncomms4934

PubMed ID

  • 25653097

Additional Document Info

start page

  • 3934

volume

  • 5