Arts & Culture

37 pages

A knowledge-based web information system for the fusion of distributed classifiers

of 37
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
A knowledge-based web information system for the fusion of distributed classifiers
  268 Tsoumakas, Bassiliades, & Vlahavas Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited. Chapter VIII A Knowledge-BasedWeb Information Systemfor the Fusion of Distributed Classifiers Grigorios Tsoumakas, Aristotle University of Thessaloniki, GreeceNick Bassiliades, Aristotle University of Thessaloniki, GreeceIoannis Vlahavas, Aristotle University of Thessaloniki, Greece ABSTRACT This chapter presents the design and development of WebDisC, aknowledge-based web information system for the fusion of classifiersinduced at geographically distributed databases. The main features of our system are: (i) a declarative rule language for classifier selection that allows the combination of syntactically heterogeneous distributed classifiers; (ii) a variety of standard methods for fusing the output of distributed classifiers; (iii) a new approach for clustering classifiers inorder to deal with the semantic heterogeneity of distributed classifiers,detect their interesting similarities and differences, and enhance their  fusion; and (iv) an architecture based on the Web services paradigm that utilizes the open and scalable standards of XML and SOAP.   This chapter appears in the book,Web Information Systems , edited by David Taniar and Johanna WennyRahayu. Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms withoutwritten permission of Idea Group Inc. is prohibited.701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USATel: 717/533-8845; Fax 717/533-8661; URL- 16*'&  IDEA GROUP PUBLISHING  Fusion of Distributed Classifiers 269 Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited. INTRODUCTION Recently, the enormous technological progress on acquiring and storingdata in a digital format has led to the accumulation of significant amounts of personal, business and scientific data. Advances in network technologies andthe Internet have led to the availability of much of these data online. Personaltext, image, audio and video files are today accessible through web pages,peer-to-peer systems, and FTP archives. Businesses have transferred theirenterprise systems online, providing their customers information and support of excellent quality in a low-cost manner. Huge scientific data from physicsexperiments, astronomical instruments, and DNA research are being storedtoday in server farms and data grids, while, at the same time, softwaretechnology for their online access and integration is being developed. Today,the grand challenge of Machine Learning, Knowledge Discovery, and DataMining scientists is to analyze this distributed information avalanche in order toextract useful knowledge.An important problem toward this challenge is that it is often unrealistic tocollect geographically distributed data for centralized processing. The neces-sary central storage capacity might not be affordable, or the necessarybandwidth to efficiently transmit the data to a single place might not beavailable. In addition, there are privacy issues preventing sensitive data (e.g.,medical, financial) from being transferred from their storage site.Another important issue is the syntactic and semantic heterogeneity of databelonging to different information systems. The schemas of distributed data-bases might differ, making the fusion of distributed models a complex task.Even in a case where the schemas match, semantic differences must also beconsidered. Real-world, inherently distributed data have an intrinsic dataskewness property. For example, data related to a disease from hospitalsaround the world might have varying distributions due to different nutritionhabits, climate and quality of life. The same is true for buying patterns identifiedin supermarkets at different regions of a country.Finally, systems that learn and combine knowledge from distributed datamust be developed using open and extensible technology standards. They mustbe able to communicate with clients developed in any programming languageand platform. Inter-operability and extensibility are of primal importance for thedevelopment of scalable software systems for distributed learning.The main objective of this chapter is the design and development of WebDisC, a knowledge-based Web information system for the fusion of classifiers induced at geographically distributed databases. Its main featuresare: (i) a declarative rule language for classifier selection that allows the  270 Tsoumakas, Bassiliades, & Vlahavas Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited. combination of syntactically heterogeneous distributed classifiers; (ii) a varietyof standard methods for fusing the output of distributed classifiers; (iii) a newapproach for clustering classifiers in order to deal with the semantic heteroge-neity of distributed classifiers, detect their interesting similarities and differ-ences, and enhance their fusion; and (iv) an architecture based on the Webservices paradigm that utilizes the open and scalable standards of XML andSOAP.In the rest of this chapter, we initially present the technologies thatconstitute the Web services framework and are at the core of the WebDisCsystem. We then give background information on classification, classifierfusion, and related work on distributed classifier systems. Subsequently, wedescribe the architecture, main functionality, and user interface of the WebDisCsystem, along with the X-DEVICE component of the system and the proposedclassifier clustering approach. Finally, we conclude this work and pose futureresearch directions. WEB SERVICES A Web service is a software system, identified by a URI, whose publicinterfaces and bindings are defined and described using XML. Its definition canbe discovered by other software systems. These systems may then interact withthe Web service in a manner prescribed by its definition, using XML-basedmessages conveyed by Internet protocols (Champion et al., 2002).The use of the Web services paradigm is expanding rapidly to provide asystematic and extensible framework for application-to-application (A2A)interaction, built on top of existing Web protocols and based on open XMLstandards. Web services aim to simplify the process of distributed computingby defining a standardized mechanism to describe, locate, and communicatewith online software systems. Essentially, each application becomes an acces-sible Web service component that is described using open standards.The basic architecture of Web services includes technologies capable of:Exchanging messages.•Describing Web services.•Publishing and discovering Web service descriptions. Exchanging Messages The standard protocol for communication among Web services is theSimple Object Access Protocol (SOAP) (Box et al., 2000). SOAP is a simple  Fusion of Distributed Classifiers 271 Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited. and lightweight XML-based mechanism for creating structured data packagesthat can be exchanged between network applications. SOAP consists of fourfundamental components: an envelope that defines a framework for describingmessage structure; a set of encoding rules for expressing instances of applica-tion-defined data types; a convention for representing remote procedure callsand responses; and a set of rules for using SOAP with HTTP. SOAP can beused with a variety of network protocols, such as HTTP, SMTP, FTP, RMI/ IIOP, or a proprietary messaging protocol.SOAP is currently the de facto standard for XML messaging for a numberof reasons. First, SOAP is relatively simple, defining a thin layer that builds ontop of existing network technologies, such as HTTP, that are already broadlyimplemented. Second, SOAP is flexible and extensible in that, rather than tryingto solve all of the various issues developers may face when constructing Webservices, it provides an extensible, composable framework that allows solu-tions to be incrementally applied as needed. Thirdly, SOAP is based on XML.Finally, SOAP enjoys broad industry and developer community support.SOAP defines four XML elements: env:Envelope is the root of the SOAP request. At the minimum, it definesthe SOAP namespace. It may define additional namespaces. •env:Header  contains auxiliary information as SOAP blocks, such asauthentication, routing information, or transaction identifier. The header isoptional. •env:Body contains one or more SOAP blocks. An example would be aSOAP block for RPC call. The body is mandatory and it must appear afterthe header. •env:Fault  is a special block that indicates protocol-level errors. If present, it must appear in the body.SOAP is used in WebDisC for the exchange of messages between thePortal and the distributed classifiers. Examples of those messages can be foundin Figures 9, 10, and 11. Describing Web Services The standard language for formally describing Web services is WebServices Description Language (WSDL). WSDL (Chinnici et al., 2002) is anXML document format for describing Web services as a set of endpointsoperating on messages containing either document-oriented or procedure-oriented (RPC) messages. The operations and messages are described ab-stractly and then bound to a concrete network protocol and message format to  272 Tsoumakas, Bassiliades, & Vlahavas Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without writtenpermission of Idea Group Inc. is prohibited. define an endpoint. Related concrete endpoints may be combined into services.WSDL is sufficiently extensible to allow description of endpoints and theirmessages regardless of what message formats or network protocols are usedto communicate. A complete WSDL definition of a service comprises a serviceinterface definition and a service implementation definition, as depicted inFigure 1.A service interface definition is an abstract or reusable service definitionthat may be instantiated and referenced by multiple service implementationdefinitions. A service interface definition can be thought of as an IDL (InterfaceDefinition Language), Java interface, or Web service type. This allows commonindustry standard service types to be defined and implemented by multipleservice implementers.In WSDL, the service interface contains elements that comprise thereusable portion of the service description: binding ,  portType , message , and type elements. In the  portType element, the operations of the Web service aredefined. The operations define what XML messages can appear in the input,output, and fault data flows. The message element specifies which XML datatypes constitute various parts of a message. The message element is used todefine the abstract content of messages that comprise an operation. The use of complex data types within the message is described in the type element. The binding element describes the protocol, data format, security and otherattributes for a particular service interface (  portType ).The service implementation definition describes how a particular serviceinterface is implemented by a given service provider. It also describes its Figure 1: WSDL Service Implementation and Interface Definitions ServiceImplementationDefinition   ServiceInterfaceDefinition   ServicePortBindingPortTypeMessageType
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!