Sales

10 pages
8 views

A Knowledge-Based Approach to Ontologies Data Integration

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
A Knowledge-Based Approach to Ontologies Data Integration
Transcript
  A Knowledge-Based Approach to Ontologies Data Integration Maria Vargas-Vera and Enrico Motta Knowledge Media Institute (KMi), The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom {m.vargas-vera@open.ac.uk} Abstract This paper describes a proposal of multiple ontology data integration system for a question answering framework called AQUA. We propose an approach for mediating between a given query and a set of resources. This method is based on a Meta-ontology (which contains contents of each individual sources) and our similarity algorithm based on analysis of neighborhood of classes. We argue that AQUA can perform mappings between queries and an ontological space  by using a mediator agent based on a Meta-ontology and our similarity algorithm. Keywords: Ontologies, Data Integration, Meta-ontology, Similarity 1. Introduction This paper focuses on the problem of incorporating an integration system to AQUA (Vargas-Vera and Motta 2004; Vargas-Vera et al 2003a,b), with a mediator agent. AQUA was developed at the Open University in England, United Kingdom. AQUA joins two different paradigms of closed-domain and open domain question answering into a single framework. One of the main characteristics of AQUA is that it uses knowledge (encoded in ontologies). This knowledge is used in several steps of the question answering process like query reformulation. AQUA translates user query written in English to first order logic (FOL). Currently, AQUA when is used as closed-domain question answering uses a single populated ontology. Therefore, we aimed to extend the architecture in order to handle multiple-ontologies. To achieve this goal we needed a mediator agent which could perform mappings between terms in queries and ontological relations. Our solution to the ontology data integration problem was the use a Meta-ontology coupled with our similarity algorithm (described in section 3.1). This Meta-ontology contains information about relations of each resource. This is not a limitation of the system because new meta-information (a new resource) can be added incrementally. The process of answering a query is divided in four steps 1.   A query planner which takes a given a query written as first order logic (FOL) and divides into sub-queries. 2.   A selection procedure provides with a subset of ontologies relevant to the query. 3.   The query-satisfaction algorithm answers the using standard techniques in question answering already implemented in AQUA Vargas-Vera and Motta 2004; Vargas-Vera et al 2003a,b).  The main contributions of our paper can be summarized as 1) identifying the information which need to be kept in the meta-ontology and identification of fragments of information which are relevant to a given query. 2) algorithms for generating relevant ontologies and similarity algorithm based on neighborhood of classes. The paper is organized as follows: Section 2 presents briefly the current architecture of AQUA which uses a single ontology. Section 3 introduces the problem of data integration and outlines our suggested algorithm for generating relevant ontologies and it also introduces our similarity algorithm embedded in AQUA. Section 4 describes related work. Finally, Section 5 gives conclusions and directions for future work. 2. AQUA process model The AQUA process model generalizes other approaches by providing a uniform framework which integrates NLP, Logic, Ontologies and information retrieval. Within this work we have focused on creating a process model for the AQUA system. Figure 1 shows the architecture of our AQUA system. Figure 1. The AQUA process model In the process model there are four phases: user interaction, question processing, document processing and answer extraction. 1.   User interaction . The user inputs the question and validates the answer (indicates whether it is correct or not). This phase uses the following components: •   Query interface . The user inputs a question (in English) using the user interface -a simple dialogue box. The user can reformulate the query if the answer is not satisfactory. The answers consist of a ranked set of answers is presented to the user. •    Answer validation . The user gives feedback to AQUA by indicating agreement or disagreement with the answer. 2.   Question processing . Question processing is performed in order to understand the question asked by the user. This ‘understanding’ of the question requires several steps  such as parsing the question, representation of the question and classification. The question processing phase uses the following components: •    NLP parser  . This segments the sentence into subject, verb, prepositional phrases, adjectives and objects. The output of this module is the logic representation of the query. •    Interpreter.  This finds a logical proof of the query over the knowledge base using unification and resolution algorithms. •   WordNet/Thesaurus.  AQUA's lexical resource. •   Ontology . Currently AQUA works with a single ontology – the AKT reference ontology which contains people, organizations, research areas, projects, publications, technologies and events. •    Failure-analysis system . This analyzes the failure of given question and gives an explanation of why the query failed. Then the user can provide new information for the pending proof and the proof can be re-started. This process can be repeated as needed. •   Question classification & reformulation . This classifies questions as belonging to any of the types supported in AQUA, (what, who, when, which, why and where). This classification is only performed if the proof failed. AQUA then tries to use an information retrieval approach. This means that AQUA has to perform document  processing and answer extraction phases. 3.   Document Processing . A set of documents are selected and a set of paragraphs are extracted. This relies on the identification of the focus 1  of the question. Document  processing consists of two components: •   Search query formulation . This transforms the srcinal question, Q  into a new question Q ', using transformation rules. Synonymous words can be used, punctuation symbols are removed, and words are stemmed. •   Search engine . This searches the web for a set of documents using a set of keywords. 4.   Answer processing . In this phase answers are extracted from passages and given a score, using the two components: •    Passage selection . This extracts passages from the set of documents likely to have the answer. •    Answer selection . This clusters answers, scores answers (using a voting model), and lastly obtains a final ballot. We want to remind the reader that phases 1, and 2 deals with the AQUA part as closed-domain question answering. Whilst 1,2, 3 and 4 deals with AQUA as open-domain question answering.. 3. Data integration Integration of different sources has been one of the fundamental problems faced in the Database community in the last decades (Batini et al., 1986) . The goal of a data integration system is to  provide a uniform interface between various data sources (Levy 2000). As a result of a data integration users obtain some benefits such as 1  Focus is a word or a sequence of words which defines the question and disambiguates it in the sense that it indicates what the question is looking for.  •   users do not need to find the relevant sources to a given query, •   to interact with each source in isolation, •   to select and cleaning and •   to combine data from multiple sources in order to answer a given query. Description Logics have been used in data integration projects to represent the conceptual level (Catarci and Lenzerini, 1993; Arens et al 1993; Goasdoue et al 2000). In our work we have extended Halevy et al. (Halevy et al 1996) work on data integration for Databases. The design of a data integration system is a very complex task which comprises several aspects. In this paper we only concentrate in one part of the mediator agent (the selection of sources). This selection  procedure has as a target goal to find relevant fragments to a given query using different resources. It can be seen as a way to trim the ontological search space. We propose a generate-relevant-ontologies  procedure which works for ontologies. The generate-relevant-ontologies  procedure relies on the fact that there is a meta description of the different sources (Meta-Ontology). Once that the relevant sources are identified the answer to the question can be performed using the selected sources. The Meta-ontology contains a formal description of the concepts, relationships between concepts A good feature is that such descriptions are independent of any system consideration. Our meta-ontology Meta-O  consists of the union of each individual description of each ontology i.e. Meta-O  = O 1 U …. U    O n where Meta-O is the global schema and O i  is one the description of ontology i  and O  j = {   R  1j  ,…, R  nj where R  ij   means the relation i  in the ontology  j .  Each ontology description consists of a set of relations written in FOL with types and arities. These descriptions can be defined in a formal language which can be FOL or DL (description Logic). The procedure generate-relevant-ontologies can be defined as follows: Procedure generate-relevant-ontologies ( O,Q ) /* O is the set of ontologies Q is a query in conjuctive form */ 1. Query planner decompose query in subqueries Q (X  1  , ...., X  n   ) = Q 1 (Z  1  )   U    Q 2 (Z  2  ) U,…, U    Q n (Z  n  ) Where each Q i can be defined as a predicate Q i (W 1 ,…W n ) = Э  X  1  , ...., X  n   ω   (X  1  , ...., X  n  )   2. S = Ө ; i=1;  3 For each Q  j  do 4. For each η i  (Y  1  ,…,Y  n )   Є    Meta-O do i Begin Compute Sim(  ω (X  1  , ...., X  n  ), η (Y  1  ,…,Y  n  ) ) using our ontology-based similarity algorithm  given in section 3.1 If Э   β  a mapping between ω (X  1  , ...., X  n  and , η (Y  1  ,…,Y  n  ) then П  = П  U {(   η (Y  1  ,…,Y  n  ), O i )} were relation η (Y  1  ,…,Y  n  ) is on ontology O i  Else End do i   5 . S = S U П   End do j 6. Return S 7. If  S is Ө    print “there is not set of ontologies relevant to the query’’   else evaluate query using the subset of ontologies in  S    8. End generate-relevant ontologies 3.1 Similarity Similarity has been an important research topic in several fields such linguistic, Artificial intelligence (in particular in the field of Natural Language Processing and Fuzzy Logic). The range of application of measure of similarity ranges from word sense disambiguation, text summarization, information extraction and retrieval, question answering, automatic indexing and automatic correction of codes. There are two types of similarity: syntactically and semantic similarity. Syntactic similarity can be defined as functions over terms. For instance, the hamming distance (used in Information Theory). This similarity is defined as the number of positions with different characters in two terms with the same length. Whilst semantic similarity can be defined as Miller and Charles (Miller et al 1991) as a continuous variable that describes the degree of synonymy between two words. When evaluating similarity in a taxonomy the most natural way to access similarity is to evaluate the distances between the two concepts being compared. Therefore the shorter is the path from one to another means that they are more similar. This approach has been used as measure of similarity. However, one of the main drawbacks is that it relies on the notion that links in a taxonomy represent uniform distances (Resnik 1995; 1998). Our own view is that similar entities are assumed to have common features 2 . For instance (university, research_institute) but it is also the case that dissimilar entities may also be semantically related by the relation meronym or holonym such as (student –person; bicycle-wheel) . 2  The common features in two entities might not be so discriminative as the features which are different in them .
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x