Devices & Hardware

10 pages
4 views

A Knowledge-Based Approach to Querying Heterogeneous Databases

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
Query processing plays a fundamental role in current information systems that need to access independent and heterogeneous databases. This paper presents a new approach to querying heterogeneous databases that maps the semantics of query objects onto
Transcript
  A Knowledge-Based Approach to QueryingHeterogeneous Databases M. Andrea Rodríguez and Marcela Varas Department of Information Engineering and Computer ScienceUniversity of Concepción,Edmundo Larenas 215, Concepción, Chile. andrea@udec.cl,mvaras@inf.udec.cl Abstract.  Query processing plays a fundamental role in currentinformation systems that need to access independent and heterogeneousdatabases. This paper presents a new approach to querying heterogeneousdatabases that maps the semantics of query objects onto database schemas.The sematics is captured by the definitions of classes in an ontology, and asimilarity function identifies not only equivalent but also semanticallysimilar classes associated with a user’s request. These similar classes arethen mapped onto a database schema, which is compared with schemas of heterogeneous databases to obtain entities in the databases that answer thequery. 1Introduction New approaches to knowledge-based retrieval have highlighted the use of ontologiesand semantic similarity functions as a mechanism for comparing objects that can beretrieved from heterogeneous databases [1,2,3,4]. Ontologies aim to capture thesemantics of a domain through concept definitions [5], which are used as primitives of a query specification and as primitives of resource descriptions. In current knowledge-based information systems, accessing information involves a semantic matchingbetween users’ requests and stored data. In environments with multiple andheterogeneous databases, this semantic matching is predicated on the assumption thatindependent databases share the same ontology or agree to adopt an ontology derivedfrom the integration of existing ones [1,2,4]. But, given the need to queryheterogeneous databases that use different conceptualizations (i.e., differentontologies), we need to modify the single-ontology paradigm of semantic matchingfor information access.We present an approach to querying heterogeneous databases based on ontologiesand similarity evaluations. We start at the top-level with users’ requests expressed byterms defined in a user ontology . In this context, a user ontology provides termsdefinitions concerning a given domain [6]. By using such an ontology we can capturea richer semantics in the users’ requests, and we allow users to express their querieswithout the need to know the schemas of data representation.  The scope of this work is the retrieval of information described by classes of objects. For example, consider the following query to a Geographic InformationSystem (GIS): “retrieve utilities  in Atlanta, Georgia.” This work concentrates onwhether or not heterogeneous databases contain such an entity as utility  orconceptually similar entities, such as  power plant   and  pipeline . We leave for futurework the treatment of query constraints given by, for example, attribute values orspatial constraints.Unlike other approaches to knowledge-based retrieval that map the local terms of adatabase onto a shared ontology [4,7,8], we map the user ontology onto a databaseschema and subsequently compare this schema with each of the schemas of theheterogeneous databases. Our approach does not force heterogeneous databases tocommit to a common single ontology, it just retrieves from these databases entitiesthat are most likely similar according to our similarity measurement to the conceptualclasses requested by the user. The strategy of this work is to map ontologicaldescriptions of query objects onto database schemas, since extracting semantics fromlogical representation of data is a much harder process than mapping semanticdefinitions onto logical structures. Thus, it combines ontologies and database schemaswith the goal of leading to intelligent database systems.The organization of this paper is as follows. Section 2 describes our main approachto querying heterogeneous databases. Section 3 describes components of the ontologyspecification and the similarity model to compare entity classes in this ontology.Section 4 describes the mapping process to the database schema and the similarityevaluation between heterogeneous database schemas. A study case in the spatialdomain is presented in Section 5. Conclusions and future research directions aredescribed in Section 6. 2Components of the Knowledge-Based Query Process The general query process  is described as follows. A user query is pre-processed toextract terms identifying entity classes in a user ontology . Using a semantic similaritymodel (Section 3), we compare entity classes in this ontology to determine all classesthat are semantically similar to the ones that we extract from the user's request. In thisway, even if the databases do not contain exactly what the user is searching for, theymay still be able to provide some semantically similar answers.Once the set of classes associated with concepts requested by the user has beendetermined, the definitions of these classes are mapped onto a database schema. To dothis mapping, a set of transformations tied to the type of database schema (e.g.,relational or object-oriented databases) is defined and applied over the classes’definitions to create a query schema , i.e., the schema of entity classes that models theuser’s request. For this paper we have used the traditional relational database schema[9] and we provide a summary of transformations that map entity classes onto thisdatabase schema. The generated query schema  is then compared to each heterogeneousdatabase (See Section 4).  In summary, our main approach includes two types of similarity assessments: (1) asemantic similarity assessment that aims at capturing classes that are semanticallysimilar to the user query and (2) a database similarity measure that comparesrepresentations of entities in database schemas. Instead of making all similarityevaluations at the database schema level or at the ontological level, we combine thesetwo similarity assessments for the following reasons: •   by using a user ontology we allow users to express queries in their own termsaccording to their own ontology without having to know the underlying modelingand representation of data in heterogeneous databases. •   we extract from the specified query and a semantic similarity model entity classesin a user ontology that are semantically associated with the user’s request. Wecompare these classes at the ontological level where we have a more completedescription of their semantics and we can obtain a set of possible answers. •   we assume that commonly existing databases have no ontological descriptions of their stored entities so, we are not provided with the full semantic description of entities stored in heterogeneous databases. Therefore, we use availablecomponents of the schema representation to compare entities through a databasesimilarity model. 3Ontology and Semantic Similarity In a previous work [10, 11], we introduced an ontology defined with retrieval purposeswhose basic specification components are described as follows. 3.1 Components of the entity classes’ definitions Components of our ontology are entity class definitions in terms of the classes’semantic interrelations and distinguishing features. We refer to entity classes by wordsor sets of synonyms, which are interrelated by hyponymy or is-a relations and bymeronymy relations or part-whole relations. We use the distinguishing features of classes to capture details among descriptions of classes that otherwise are missed inthe classes’ semantic interrelations. For example, we can say that a hospital  and an apartment building  have a common superclass building ; however, this informationfalls short when trying to differentiate a hospital  from an apartment building . Wesuggest a finer identification of distinguishing features and classify them intofunctions, parts, and attributes. Function features are intended to represent what isdone to or with a class. Parts are structural elements of a class, such as the roof   and  floor   of a building , that may have not be defined as entity classes. Finally, attributescan correspond to additional characteristics of a class that are not considered by eitherthe set of parts or functions. This classification of distinguishing features into parts,functions, and attributes attempts to facilitate the implementation of the entity classrepresentation, as well as it enables the separate manipulation of each type of distinguishing feature [11].  3.2 Semantic Similarity We define a computational model that assesses similarity by combining a feature-matching process with a semantic-distance measurement [11]. The global similarityfunction S  C  ( c 1 , c 2 ) is a weighted sum of the similarity values for parts, functions, andattributes. For each type of distinguishing feature we use a similarity function S t  ( c 1 , c 2 ) (Equation 1), which is based on the ratio model  of a feature-matching process[12]. In S t  ( c 1 , c 2 ), c 1  and c 2  are two entity classes, t   symbolizes the type of features,and C  1  and C  2  are the respective sets of features of type t for c 1  and c 2 . The matchingprocess determines the cardinality (| |) of the set intersection ( C  1 ∩ C  2 ) and the setdifference ( C  1   −   C  2 ).  S c c C C C C c c C C c c C C  t   ( , ) | || | ( , ) | | ( ( , )) | | 1 21 21 2 1 2 1 2 1 2 2 1 1 =∩∩ + ⋅ − + − ⋅ − α α  (1) The function α   determines the relative importance of different features betweenentity classes. This function α    is defined in terms of the degree of generalization of entity classes in the hierarchical structure, which is determined by a semantic-distancemeasurement. This definition assumes that a prototype is generally a superclass for avariant and that the concept used as a reference (i.e., the second argument) should bemore relevant in the evaluation [12,13].Our similarity model has two advantages over semantic similarity models based onsemantic distances and their variations [14,15,16]. First, it allows us to discriminateamong closely related classes. For example, we could distinguish similarity betweenpairs of subclasses (between hospita l and house  an between hospital  and apartment building , which are all subclasses of building ) and between classes that are indirectlyconnected in the hierarchical structural ( stadium  as a subclass of construction  and athletic field   as a subclass of  field  ). Second, our model does not assume a symmetricevaluation of similarity and allows us to consider context dependencies associated withthe relative importance of distinguishing features [10,11]. 4Mapping and Comparison of Database Schemas Once we have the desired entity classes, the next step in processing the query is tomap the entities classes of our ontology onto database schemas, which are thencompared with schemas of heterogeneous databases. We describe our approach tomapping with databases that are modeled with the relational database schema [9];however, we could have used another type of database schema, such as an object-oriented schema, in which case new mapping transformations should be defined.  4.1 From Ontology to Database Schema We assume that the existing database schemas (target schemas) are represented in therelational model with the following constructors: •   Entities: names, attributes, primary key, and foreign keys. •   Attributes: names. •   Foreign keys (FK): relations that they belong and refer to . Prior to transforming the entity classes’ definitions into a relational schema, weapply preprocessing to these definitions in order to keep only those components thatcan be mapped onto a relational schema: •   Semantic relations extraction: semantic relations are considered whiledescriptions and distinguishing features are eliminated in the subsequent mappingprocess. As we will explain in Section 4.2, we do not compare attributes (i.e.,distinguishing features) since this would give misleading results due to thestrong application dependences of attribute definitions in existing databases. •   Synonym extraction: Since synonyms are important to managing the multipleways that people can refer to the same entity class, and since synonyms are notdirectly handled in the relation schema, we define an additional structure to dealwith synonym sets of entity classes. This structure includes the set of synonymsets and an index as unique key.Then, we take the simplified entity classes’ definitions and we map them onto arelational schema. There is a direct mapping of entity classes onto relational schema;however, we also need to define transformations for mapping is-a, is-part- and whole-of semantic relations. Since there are several alternatives to mapping semanticrelations onto relational schemas, we define a subset that considers only relationaltables or entities' interrelations mapped through foreign keys (Table1). Table 1.  Mapping transformations from the entity class definition onto a relationalschema SemanticRelationTransformationIs-A  •    Isa 1 :  Create an entity for each of the children  entities with a foreign key pointing to the parent entity. Part-Of   •    Part  1 :  Define new structures (relations) that associate whole  entities with  part   entities. •    Part  2 : Create a foreign key in a  part   entity for each of its whole  entities. Whole-Of   •   Whole 1 : Define new structures (relations) that associate whole  entitieswith  part   entities. •   Whole 2  Create a foreign key in a whole  entity for each its  part   entities. The combination of the alternative transformations (i.e., 1 alternative for is-arelations, 2 alternatives for whole-of and part-of relations) gives us 4 possiblemapping transformations.
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x