Use of computerized algorithm to identify individuals in need of testing for celiac disease.
International Classification of Diseases
Electronic Health Records
Natural Language Processing
Celiac disease (CD) is a lifelong immune-mediated disease with excess mortality. Early diagnosis is important to minimize disease symptoms, complications, and consumption of healthcare resources. Most patients remain undiagnosed. We developed two electronic medical record (EMR)-based algorithms to identify patients at high risk of CD and in need of CD screening.
(I) Using natural language processing (NLP), we searched EMRs for 16 free text (and related) terms in 216 CD patients and 280 controls. (II) EMRs were also searched for ICD9 (International Classification of Disease) codes suggesting an increased risk of CD in 202 patients with CD and 524 controls. For each approach, we determined the optimal number of hits to be assigned as CD cases. To assess performance of these algorithms, sensitivity and specificity were calculated.
Using two hits as the cut-off, the NLP algorithm identified 72.9% of all celiac patients (sensitivity), and ruled out CD in 89.9% of the controls (specificity). In a representative US population of individuals without a prior celiac diagnosis (assuming that 0.6% had undiagnosed CD), this NLP algorithm could identify a group of individuals where 4.2% would have CD (positive predictive value). ICD9 code search using three hits as the cut-off had a sensitivity of 17.1% and a specificity of 88.5% (positive predictive value was 0.9%).
This study shows that computerized EMR-based algorithms can help identify patients at high risk of CD. NLP-based techniques demonstrate higher sensitivity and positive predictive values than algorithms based on ICD9 code searches.