Automated Category and Trend Analysis of Scientific Articles on Ophthalmology Using Large Language Models: Development and Usability Study. Academic Article uri icon

Overview

abstract

  • BACKGROUND: In this paper, we present an automated method for article classification, leveraging the power of large language models (LLMs). OBJECTIVE: The aim of this study is to evaluate the applicability of various LLMs based on textual content of scientific ophthalmology papers. METHODS: We developed a model based on natural language processing techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we used zero-shot learning LLMs and compared Bidirectional and Auto-Regressive Transformers (BART) and its variants with Bidirectional Encoder Representations from Transformers (BERT) and its variants, such as distilBERT, SciBERT, PubmedBERT, and BioBERT. To evaluate the LLMs, we compiled a data set (retinal diseases [RenD] ) of 1000 ocular disease-related articles, which were expertly annotated by a panel of 6 specialists into 19 distinct categories. In addition to the classification of articles, we also performed analysis on different classified groups to find the patterns and trends in the field. RESULTS: The classification results demonstrate the effectiveness of LLMs in categorizing a large number of ophthalmology papers without human intervention. The model achieved a mean accuracy of 0.86 and a mean F1-score of 0.85 based on the RenD data set. CONCLUSIONS: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval. We performed a trend analysis that enables researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.

publication date

  • March 22, 2024

Identity

PubMed Central ID

  • PMC10998173

Digital Object Identifier (DOI)

  • 10.2196/52462

PubMed ID

  • 38517457

Additional Document Info

volume

  • 8