Automatic generation of investigator bibliographies for institutional research networking systems. Academic Article uri icon

Overview

MeSH

  • Artificial Intelligence
  • Bibliography as Topic
  • Biomedical Research
  • Social Networking
  • Vocabulary, Controlled

MeSH Major

  • Abstracting and Indexing as Topic
  • Algorithms
  • Authorship
  • Data Mining
  • Natural Language Processing
  • Pattern Recognition, Automated
  • PubMed

abstract

  • Publications are a key data source for investigator profiles and research networking systems. We developed ReCiter, an algorithm that automatically extracts bibliographies from PubMed using institutional information about the target investigators. ReCiter executes a broad query against PubMed, groups the results into clusters that appear to constitute distinct author identities and selects the cluster that best matches the target investigator. Using information about investigators from one of our institutions, we compared ReCiter results to queries based on author name and institution and to citations extracted manually from the Scopus database. Five judges created a gold standard using citations of a random sample of 200 investigators. About half of the 10,471 potential investigators had no matching citations in PubMed, and about 45% had fewer than 70 citations. Interrater agreement (Fleiss' kappa) for the gold standard was 0.81. Scopus achieved the best recall (sensitivity) of 0.81, while name-based queries had 0.78 and ReCiter had 0.69. ReCiter attained the best precision (positive predictive value) of 0.93 while Scopus had 0.85 and name-based queries had 0.31. ReCiter accesses the most current citation data, uses limited computational resources and minimizes manual entry by investigators. Generation of bibliographies using named-based queries will not yield high accuracy. Proprietary databases can perform well but requite manual effort. Automated generation with higher recall is possible but requires additional knowledge about investigators. Copyright © 2014 Elsevier Inc. All rights reserved.

publication date

  • October 2014

has subject area

  • Abstracting and Indexing as Topic
  • Algorithms
  • Artificial Intelligence
  • Authorship
  • Bibliography as Topic
  • Biomedical Research
  • Data Mining
  • Natural Language Processing
  • Pattern Recognition, Automated
  • PubMed
  • Social Networking
  • Vocabulary, Controlled

Research

keywords

  • Journal Article

Identity

Language

  • eng

PubMed Central ID

  • PMC4180817

Digital Object Identifier (DOI)

  • 10.1016/j.jbi.2014.03.013

PubMed ID

  • 24694772

Additional Document Info

start page

  • 8

end page

  • 14

volume

  • 51