A single-molecule long-read survey of the human transcriptome. Academic Article uri icon

Overview

abstract

  • Global RNA studies have become central to understanding biological processes, but methods such as microarrays and short-read sequencing are unable to describe an entire RNA molecule from 5' to 3' end. Here we use single-molecule long-read sequencing technology from Pacific Biosciences to sequence the polyadenylated RNA complement of a pooled set of 20 human organs and tissues without the need for fragmentation or amplification. We show that full-length RNA molecules of up to 1.5 kb can readily be monitored with little sequence loss at the 5' ends. For longer RNA molecules more 5' nucleotides are missing, but complete intron structures are often preserved. In total, we identify ∼14,000 spliced GENCODE genes. High-confidence mappings are consistent with GENCODE annotations, but >10% of the alignments represent intron structures that were not previously annotated. As a group, transcripts mapping to unannotated regions have features of long, noncoding RNAs. Our results show the feasibility of deep sequencing full-length RNA from complex eukaryotic transcriptomes on a single-molecule level.

publication date

  • October 13, 2013

Research

keywords

  • Gene Expression Profiling
  • Sequence Analysis, RNA
  • Transcriptome

Identity

PubMed Central ID

  • PMC4075632

Scopus Document Identifier

  • 84887412533

Digital Object Identifier (DOI)

  • 10.1038/nbt.2705

PubMed ID

  • 24108091

Additional Document Info

volume

  • 31

issue

  • 11