Characterization of structural variants with single molecule and hybrid sequencing approaches. Academic Article uri icon

Overview

MeSH

  • Genomics
  • Humans
  • Repetitive Sequences, Nucleic Acid
  • Sequence Deletion

MeSH Major

  • Algorithms
  • Genomic Structural Variation
  • High-Throughput Nucleotide Sequencing
  • Sequence Analysis, DNA

abstract

  • Structural variation is common in human and cancer genomes. High-throughput DNA sequencing has enabled genome-scale surveys of structural variation. However, the short reads produced by these technologies limit the study of complex variants, particularly those involving repetitive regions. Recent 'third-generation' sequencing technologies provide single-molecule templates and longer sequencing reads, but at the cost of higher per-nucleotide error rates. We present MultiBreak-SV, an algorithm to detect structural variants (SVs) from single molecule sequencing data, paired read sequencing data, or a combination of sequencing data from different platforms. We demonstrate that combining low-coverage third-generation data from Pacific Biosciences (PacBio) with high-coverage paired read data is advantageous on simulated chromosomes. We apply MultiBreak-SV to PacBio data from four human fosmids and show that it detects known SVs with high sensitivity and specificity. Finally, we perform a whole-genome analysis on PacBio data from a complete hydatidiform mole cell line and predict 1002 high-probability SVs, over half of which are confirmed by an Illumina-based assembly. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

publication date

  • December 15, 2014

has subject area

  • Algorithms
  • Genomic Structural Variation
  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Repetitive Sequences, Nucleic Acid
  • Sequence Analysis, DNA
  • Sequence Deletion

Research

keywords

  • Journal Article

Identity

Language

  • eng

PubMed Central ID

  • PMC4253835

Digital Object Identifier (DOI)

  • 10.1093/bioinformatics/btu714

PubMed ID

  • 25355789

Additional Document Info

start page

  • 3458

end page

  • 3466

volume

  • 30

number

  • 24