A technique for semantic classification of unknown words using UMLS resources. Academic Article uri icon


MeSH Major

  • Medical Records
  • Natural Language Processing
  • Unified Medical Language System
  • Vocabulary, Controlled


  • Natural Language Processing (NLP) is a tool for transforming natural text into codable form. Success of NLP systems is contingent on a well constructed semantic lexicon. However, creation and maintenance of these lexicons is difficult, costly and time consuming. The UMLS contains semantic and syntactic information of medical terms, which may be used to automate some of this task. Using UMLS resources we have observed that it is possible to define one semantic type by its syntactic combinations with other types in a corpus of discharge summaries. These patterns of combination can then be used to classify words which are not in the lexicon. The technique was applied to a corpus for a single semantic type and generated a list of 875 words which matched the classification criteria for that type. The words were ranked by number of patterns matched and the top 95 words were correctly typed with 80% accuracy.

publication date

  • January 1999



  • Academic Article



  • eng

PubMed Central ID

  • PMC2232586

PubMed ID

  • 10566453

Additional Document Info

start page

  • 716

end page

  • 20