Documents

6 pages
35 views

Knowledge based systems text analysis

of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
https://www.irjet.net/archives/V3/i6/IRJET-V3I6578.pdf
Transcript
    International Research Journal of Engineering and Technology   (IRJET)   e-ISSN: 2395 -0056   Volume: 03 Issue: 06 | June-2016 www.irjet.net p-ISSN: 2395-0072   © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 3075    Knowledge based systems text analysis Dr. Shubhangi D.C 1 ,Ravikiran Mitte 2   Dr. Shubhangi D.C 1 H.O.D, Department of Computer Science and Engineering, PG Center Regional Office VTU, Kalaburagi, Karnataka (India).   Ravikiran Mitte  2 P.G.Student, Department of Computer Science and Engineering, PG Center Regional Office VTU, Kalaburagi, Karnataka (India)  Abstract : The astronomically immense number of potential applications from bridging Web data with cognizance bases has led to an incrementation in the entity linking research. Entity linking is the task to link entity mentions in text with their corresponding entities in a knowledgebase. Potential applications include information extraction, information retrieval, and cognizance base population. However, this task is challenging due to designate variations and entity ambiguity. In this survey, we present an exhaustive overview and analysis of the main approaches to entity linking, and discuss sundry applications, the evaluation of entity linking systems, and future directions. Keywords:  knowledge based systems, text analysis . 1.   Introduction Tidally and the Web has become one of the most sizably voluminous HE amount of Web data has incremented exponent data repositories in the world in recent years. Plenty of data on the Web is in the form of natural language. However, natural language is highly equivocal, especially with reverence to the frequent occurrences of denominated entities. A denominated entity may have multiple names and a designation could denote several different denominated entities. On the other hand, the advent of cognizance sharing communities such as Wikipedia and the development of information extraction techniques have facilitated the automated construction of astronomically immense scale machine-readable cognizance bases. Cognizance bases contain affluent information about the world’s entities, their semantic classes, and their mutual relationships. Such kind of eminent examples include DBpedia YAGO Freebase Ken tall Read the Web and Probes. Bridging Web data with cognizance bases is propitious for annotating the plethora of raw and often noisy data on the Web and contributes to the vision of Semantic Web. A critical step to achieve these goalies to link denominated entity mentions appearing in Web text with their corresponding entities in a knowledgebase, which is called entity linking. Entity linking can facilitate many different tasks such as cognizance base population, question answering, and information integration. As the world evolves, incipient facts are engendered and digitally expressed on the Web. Consequently, enriching subsisting cognizance bases utilizing incipient facts becomes increasingly consequential. However, inserting incipiently extracted cognizance derived from the information extraction system into a subsisting cognizance base ineluctably needs a system to map an entity mention associated with the extracted erudition to the corresponding entity in the cognizance base. For example, cognation extraction is the process of discovering utilizable relationships between entities mentioned in text and the extracted cognation requires the process of mapping entities associated with the cognation to the erudition base afore it could be populated into the cognizance base. Furthermore, an immensely colossal number of question answering systems rely on their fortified cognizance bases to give the answer to the user’s question. To answer the question “What is the birth  date of the famous basketball player Michael Jordan?”, the system should first leverage the entity linking technique to map the queried “Michael Jordan” to the NBA player, in lieu of for example, the Berkeley edifier; and then it retrieves the birth date of the NBA player designated “Michael Jordan” from the cognizance base directly. Adscititiously, entity linking avails puissant join and coalescence operations that can integrate information about entities across different pages, documents, and sites.    International Research Journal of Engineering and Technology   (IRJET)   e-ISSN: 2395 -0056   Volume: 03 Issue: 06 | June-2016 www.irjet.net p-ISSN: 2395-0072   © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 3076    1.1   Task Description   Given an erudition base containing a set of entities E and a text amassment in which a set of denominated entity mentions M are identified in advance, the goal of entity linking is to map each textual entity mention ∈  M to its corresponding entity e ∈  E in the cognizance base. Here, a designated entity mention mis a token sequence in text which potentially refers to some denominated entity and is identified in advance. It is possible that some entity mention in text does not have its corresponding entity record in the given erudition base. We define this kind of mentions as unlink able mentions and give NIL as a special label denoting “un   linkable”. Ergo, if the matc hing entity e for entity mention m does not subsist in the cognizance base (i.e., e / ∈  E), an entity linking system should label m as NIL. For un linkable mentions, there are some studies that identify their fine-grained types from the cognizance predicate which is out of scope for entity linking systems. Entity linking is adscititiously called Denominated Entity Disambiguation (NED) in the NLP community. In this paper, we just fixate on entity linking for English language, rather than cross lingual entity linking 2. Related Work The information extraction system into a subsisting cognizance base ineluctably needs a system to map an entity mention associated with the extracted cognizance to the corresponding entity in the cognizance base. On the other hand, an entity mention could possibly denote different designated entities. For instance, the entity mention “Sun” can refer to the star at the center of the Solar System, a multinational computer company, a fictional character named “Sun - Hwa Kwon” on the ABC television series “Lost “or many other entities which can be referred to as “Sun”. An entity linking system has to disambiguate the entity mention in the textual context and identify the mapping entity for each entity mention. S. Auer [1], C. Bizer, DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data. DBpedia is a major source of open, royalty-free data on the Web. We hope that by interlinking DBpedia with further data sources, it could serve as a nucleus for the emerging Web of Data. A. Carlson[2], J. Bette ridge, R. C. Wang, We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., Plays Sport(athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled examples is typically unreliable because the learning task is under constrained. This paper pursues the thesis that much greater accuracy can be achieved by further constraining the learning task, by coupling the semi-supervised training of many extractors for different categories and relations. We characterize several ways in which the training of category and relation extractors can be coupled, and present experimental results demonstrating significantly improved accuracy as a result. We have presented methods of coupling the semi supervised learning of category and relation instance extractors and demonstrated empirically that coupling forestalls the problem of semantic drift associated with bootstrap learning methods. This empirical evidence leads us to advocate large-scale coupled training as a strategy to significantly improve accuracy in semi-supervised learning. E.Agichtein [3], Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns that in turn result in new tupelos being extracted from the    International Research Journal of Engineering and Technology   (IRJET)   e-ISSN: 2395 -0056   Volume: 03 Issue: 06 | June-2016 www.irjet.net p-ISSN: 2395-0072   © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 3077    document collection. We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tupelos from plain-text documents. At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tupelos without human intervention, and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents. This paper presents Snowball, a system for extracting relations from large collections of plain-text documents that requires minimal training for each new scenario. We introduced novel strategies for generating extraction patterns for Snowball, as well as techniques for evaluating the quality of the patterns and tuples generated at each step of the extraction process. Our large-scale experimental evaluation of our system shows that the new techniques produce high-quality tables, according to the scalable evaluation methodology that we introduce in this paper. Our experiments involved over 300,000 newspaper articles. We proposed an unsupervised method for relation discovery from large corpora. The key idea was clustering of pairs of named entities according to the similarity of the context words intervening between the named entities. The experiments using one year’s newspapers revealed not only that the relations among named entities could be detected with high recall and precision, but also that appropriate labels could be automatically provided to the relations. In the future, we are planning to discover less frequent pairs of named entities by combining our method with bootstrapping as well as to improve our method by tuning parameters. 3. System Architecture   1 . Query Processing . First, we try to correct the spelling errors in the queries by using query spelling correction supplied by Google. Second, we expand the query in three ways: expanding acronym queries from the text where the entity is located, expanding queries with the corresponding redirect pages of Wikipedia and expanding queries by using the anchor text in the pages from Wikipedia . 2. Candidates Generation . With the queries generated in the first step, the candidate generation module retrieves the candidates from the Knowledge Base. The candidate generation module also makes use of the disambiguation pages in Wikipedia. If there is a disambiguation page corresponding to the query, the linked entities listed in the disambiguation page are added to the candidate set. 3 . Candidates Ranking . In the module, we rank all the candidates with learning to rank methods . 4 . Top1 Candidate Validation .   To deal with those queries without appropriate matching, we finally add a validation module to judge whether the top one candidate is the target entry.    International Research Journal of Engineering and Technology   (IRJET)   e-ISSN: 2395 -0056   Volume: 03 Issue: 06 | June-2016 www.irjet.net p-ISSN: 2395-0072   © 2016, IRJET | Impact Factor value: 4.45 | ISO 9001:2008 Certified Journal | Page 3078    4.   Methodology   After careful analysis the system has been identified to have the following modules: 1.   Entity linking 2. Knowledge base 3. Candidate Entity Ranking. 1. Entity linking   Entity linking can facilitate many different tasks such as cognizance base population, question answering, and information integration. As the world evolves, incipient facts are engendered and digitally expressed on the Web. Ergo, enriching subsisting erudition bases utilizing incipient facts becomes increasingly paramount. However, inserting incipiently extracted erudition derived from the information extraction system into a subsisting cognizance base ineluctably needs a system to map an entity mention associated with the extracted cognizance to the corresponding entity in the erudition base. Adscititiously, entity linking avails puissant join and coalescence operations that can integrate information about entities across different pages, documents, and sites. The entity linking task is challenging due to denominate variations and entity ambiguity .  2 . Candidate Entity Generation   Candidate Entity Generation module, for each entity mention m ∈  M, entity linking systems endeavor to include possible entities that entity mention m may refer to in the set of candidate entities Em. Approaches to candidate entity generation are mainly predicated on string comparison between the surface form of the entity mention and the designation of the entity subsisting in an erudition base. This module is   as paramount as the Candidate Entity Ranking module and critical for a prosperous entity   linking system   according to the experiments conducted by Hatchery et al. In the remnant of this section, we review the main approaches that have been applied for engendering the candidate entity set Em for entity mention m. 3. Candidate Entity Ranking In most cases, the size of the candidate entity set me is more sizably voluminous than one. Researchers leverage different kinds of evidence to rank the candidate entities in Me and endeavor to find the entity e ∈ Which is the most likely link for mention m. Inspection we will review the main techniques utilized in this ranking process, including supervised ranking methods and To deal with the quandary of presaging unlinkablementions, some work leverages this module to validate whether the top-ranked entity identified in the Candidate Entity Ranking module is the target entity for mention m. Otherwise, they return NIL for mention m. In, we will give an overview of the main approaches for presaging unlikable mentions. For this ranking we utilize Vector Space Model. Methods Based on Search Engines: Some entity linking systems try to leverage the whole Web information to identify candidate entities via Web search engines (such as Google). Specifically, Han and Zhao [61] submitted the entity mention together with its short context to the Google API and obtained only Web pages within Wikipedia to regard them as candidate entities. Dredges et al. [83] queried the Google search engine using the entity mention and identified candidate entities whose Wikipedia pages appear in the top 20 Google search results for the query. Lehmann et al. [69] and Monahan et al. [73] stated that the Google search engine is very effective at identifying some of the very difficult mappings between surface forms and entities. They performed the query using the Google API limited to the English Wikipedia site and filtered results whose Wikipedia titles are not significantly Dice or acronym based similar to   the query. Lastly, they utilized the top three results as candidate entities. In addition, Wikipedia search engine is also exploited to retrieve candidate entities which can return a list of relevant Wikipedia entity pages when you query it based on keyword matching. Zhang et al. [65] utilized this feature to generate infrequently   mentioned   candidate entities by querying this search engine using the string of the entity mention.
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x