A technique for semantic classification of unknown words using UMLS resources.
Natural Language Processing
Unified Medical Language System
Natural Language Processing (NLP) is a tool for transforming natural text into codable form. Success of NLP systems is contingent on a well constructed semantic lexicon. However, creation and maintenance of these lexicons is difficult, costly and time consuming. The UMLS contains semantic and syntactic information of medical terms, which may be used to automate some of this task. Using UMLS resources we have observed that it is possible to define one semantic type by its syntactic combinations with other types in a corpus of discharge summaries. These patterns of combination can then be used to classify words which are not in the lexicon. The technique was applied to a corpus for a single semantic type and generated a list of 875 words which matched the classification criteria for that type. The words were ranked by number of patterns matched and the top 95 words were correctly typed with 80% accuracy.