Use of computerized algorithm to identify individuals in need of testing for celiac disease. Academic Article uri icon

Overview

MeSH

  • Adult
  • Age Distribution
  • Child
  • Female
  • Humans
  • International Classification of Diseases
  • Male
  • Phenotype
  • Risk
  • Sex Distribution

MeSH Major

  • Algorithms
  • Celiac Disease
  • Electronic Health Records
  • Natural Language Processing

abstract

  • Celiac disease (CD) is a lifelong immune-mediated disease with excess mortality. Early diagnosis is important to minimize disease symptoms, complications, and consumption of healthcare resources. Most patients remain undiagnosed. We developed two electronic medical record (EMR)-based algorithms to identify patients at high risk of CD and in need of CD screening. (I) Using natural language processing (NLP), we searched EMRs for 16 free text (and related) terms in 216 CD patients and 280 controls. (II) EMRs were also searched for ICD9 (International Classification of Disease) codes suggesting an increased risk of CD in 202 patients with CD and 524 controls. For each approach, we determined the optimal number of hits to be assigned as CD cases. To assess performance of these algorithms, sensitivity and specificity were calculated. Using two hits as the cut-off, the NLP algorithm identified 72.9% of all celiac patients (sensitivity), and ruled out CD in 89.9% of the controls (specificity). In a representative US population of individuals without a prior celiac diagnosis (assuming that 0.6% had undiagnosed CD), this NLP algorithm could identify a group of individuals where 4.2% would have CD (positive predictive value). ICD9 code search using three hits as the cut-off had a sensitivity of 17.1% and a specificity of 88.5% (positive predictive value was 0.9%). This study shows that computerized EMR-based algorithms can help identify patients at high risk of CD. NLP-based techniques demonstrate higher sensitivity and positive predictive values than algorithms based on ICD9 code searches.

publication date

  • December 2013

has subject area

  • Adult
  • Age Distribution
  • Algorithms
  • Celiac Disease
  • Child
  • Electronic Health Records
  • Female
  • Humans
  • International Classification of Diseases
  • Male
  • Natural Language Processing
  • Phenotype
  • Risk
  • Sex Distribution

Research

keywords

  • Journal Article

Identity

Language

  • eng

PubMed Central ID

  • PMC3861918

Digital Object Identifier (DOI)

  • 10.1136/amiajnl-2013-001924

PubMed ID

  • 23956016

Additional Document Info

start page

  • e306

end page

  • e310

volume

  • 20

number

  • e2