Learning classifiers from semantically heterogeneous data
Semantically heterogeneous and distributed data sources are quite common in several application domains such as bioinformatics and security informatics. In such a setting, each data source has an associated ontology. Different users or applications need to be able to query such data sources for statistics of interest (e.g., statistics needed to learn a predictive model from data). Because no single ontology meets the needs of all applications or users in every context, or for that matter, even a single user in different contexts, there is a need for principled approaches to acquiring statistics from semantically heterogeneous data. In this paper, we introduce ontology-extended data sources and define a user perspective consisting of an ontology and a set of interoperation constraints between data source ontologies and the user ontology. We show how these constraints can be used to derive mappings from source ontologies to the user ontology. We observe that most of the learning algorithms use only certain statistics computed from data in the process of generating the hypothesis that they output. We show how the ontology mappings can be used to answer statistical queries needed by algorithms for learning classifiers from data viewed from a certain user perspective. © Springer-Verlag 2004.
Additional Document Info
has global citation frequency