Similarity evaluation in a content-based image retrieval (CBIR) CADx system for characterization of breast masses on ultrasound images.

Overview

abstract

PURPOSE: The authors are developing a content-based image retrieval (CBIR) CADx system to assist radiologists in characterization of breast masses on ultrasound images. In this study, the authors compared seven similarity measures to be considered for the CBIR system. The similarity between the query and the retrieved masses was evaluated based on radiologists' visual similarity assessments. METHODS: The CADx system retrieves masses that are similar to a query mass from a reference library based on computer-extracted features using a k-nearest neighbor (k-NN) approach. Among seven similarity measures evaluated for the CBIR system, four similarity measures including linear discriminant analysis (LDA), Bayesian neural network (BNN), cosine similarity measure (Cos), and Euclidean distance (ED) similarity measure were compared by radiologists' visual assessment. For LDA and BNN, the features of a query mass were combined first into a malignancy score and then masses with similar scores were retrieved. For Cos and ED, similar masses were retrieved based on the normalized dot product and the Euclidean distance, respectively, between two feature vectors. For the observer study, three most similar masses were retrieved for a given query mass with each method. All query-retrieved mass pairs were mixed and presented to the radiologists in random order. Three Mammography Quality Standards Act (MQSA) radiologists rated the similarity between each pair using a nine-point similarity scale (1 = very dissimilar, 9 = very similar). The accuracy of the CBIR CADx system using the different similarity measures to characterize malignant and benign masses was evaluated by ROC analysis. RESULTS: The BNN measure used with the k-NN classifier provided slightly higher performance for classification of malignant and benign masses (A(z) values of 0.87) than those with the LDA, Cos, and ED measures (A(z) of 0.86, 0.84, and 0.81, respectively). The average similarity ratings of all radiologists for LDA, BNN, Cos, and ED were 4.71, 4.95, 5.18, and 5.32, respectively. The k-NN with the ED measures retrieved masses of significantly higher similarity (p < 0.008) than LDA and BNN. CONCLUSIONS: Similarity measures using the resemblance of individual features in the multidimensional feature space can retrieve visually more similar masses than similarity measures using the resemblance of the classifier scores. A CBIR system that can most effectively retrieve similar masses to the query may not have the best A(z).