language-icon Old Web
English
Sign In

Semantic heterogeneity

Semantic heterogeneity is when database schema or datasets for the same domain are developed by independent parties, resulting in differences in meaning and interpretation of data values. Beyond structured data, the problem of semantic heterogeneity is compounded due to the flexibility of semi-structured data and various tagging methods applied to documents or unstructured data. Semantic heterogeneity is one of the more important sources of differences in heterogeneous datasets.LanguageEncodingFor example, ASCII v UTF-8Ambiguous sentence references, such as I'm glad I'm a man, and so is Lola (Lola by Ray Davies and the Kinks)SynonymsAcronymsHomonymsWhen two types (classes or sets) are asserted as being the same when the scope and reference are not (for example, Berlin the city v Berlin the official city-state)When two individuals are asserted as being the same when they are actually distinct (for example, John F. Kennedy the president v John F. Kennedy the aircraft carrier)DomainData representationConfusion often arises in the use of literals v URIs v object typesDataA common problem, more acute with closed world approaches than with open world ones Semantic heterogeneity is when database schema or datasets for the same domain are developed by independent parties, resulting in differences in meaning and interpretation of data values. Beyond structured data, the problem of semantic heterogeneity is compounded due to the flexibility of semi-structured data and various tagging methods applied to documents or unstructured data. Semantic heterogeneity is one of the more important sources of differences in heterogeneous datasets. Yet, for multiple data sources to interoperate with one another, it is essential to reconcile these semantic differences. Decomposing the various sources of semantic heterogeneities provides a basis for understanding how to map and transform data to overcome these differences. One of the first known classification schemes applied to data semantics is from William Kent more than two decades ago. Kent's approach dealt more with structural mapping issues than differences in meaning, which he pointed to data dictionaries as potentially solving. One of the most comprehensive classifications is from Pluempitiwiriyawej and Hammer, 'Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources'. They classify heterogeneities into three broad classes:

[ "Ontology-based data integration" ]
Parent Topic
Child Topic
    No Parent Topic