Home & Garden

11 pages

A knowledge-based system approach for scientific data analysis and the notion of metadata

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
A knowledge-based system approach for scientific data analysis and the notion of metadata
  See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/232641843 A knowledge-based system approach forscientific data analysis and the notion of metadata  Article  · January 1995 DOI: 10.1109/MASS.1995.528237 · Source: IEEE Xplore CITATION 1 READS 26 Some of the authors of this publication are also working on these related projects: Graph Based Pattern Matching Approaches for Indexing and Querying of Linked Open Data (LOD).View projectAll content following this page was uploaded by Epaminondas Kapetanios on 11 January 2017. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the srcinal documentand are linked to publications on ResearchGate, letting you access and read them immediately.  A Knowledge-Based System Approach for Scientific Data Analysis and the Notion of Metadata zyx paminondas Kapetanios zyxw esearch Center Karisruhe-Technology and Environment institute of Applied Computer Science Federal Republic of Germany Ralf Kramer Forschungszentrum informatik FZ) Federal Republic of Germany zyxw Abstract zyxwvutsrq ver the last fa0 years, dramatic increases and advances in zyxwvutsr ass storage for both secondary and tertiary storage made possible the handling zyxwvutsr f big amounts of data zyxwvut for ex- ample, satellite data, complex scient@ xperiments, and so on). However; to thefull use of these advances, metadata for data analysis and interpretation, as well as the com- plexity of managing and accessing large datasets through intelligent and efJicient methods, are still considered to be the main challenges to the information-science community when dealing with large databases. Scient@c data must be analyzed and interpreted by metadata, which has a descrip- tive role for the underlying data. Metadata can be, partly, a priori deBnable according to the domain of discourse under Consideration f jior example, atmospheric chemistry) and the conceptualization of the information system to be built. It my lso be extracted by using learning methods from time-series measurement and observation data. In this paper; a knowledge-based management system (KBMS) is presented for the extraction and management of metadata in order to bridge the gap between data and infomtion. The KBMS is a component of an intelligent infomtion sys- tem based upon a federated architecture, also including a database management system for time-series-oriented data and a visualization system. Trying to give a rigorous definition of metadata presents some difficulties. At first glance, the term metadata denotes data that can be found at a more abstract level describing some other set of data. At this point, let us recall the distinc- tion between data and information given by D.C. Thichritzis and ]EH. Lochovsky [ 11: It s important to realize the distinction between data and information. Data are facts collected from observations or measurements. Information is the meaningful interpretation and correlation of data that allows one to make decisions. According to this definition, metadata seems to be something between data and infomtion. It can be con- sidered a descriptive ool of the underlying data that can be used in order to reach the information level to be provided. In this sense, metadata can be a priori defined according to the already known domains of scientific discipline and experiment (infomtion system conceptual modeling), or may be derived from the underlying data, especially in cases where a non-a priori-definable domain of discourse addressing the scientific disciplinemust also be considered. Therefore, metadata is going to be used for the analysis and interpretation of scientific data aiming at the extraction of information (see Figure la). On the other hand, informa- tion in terms of already known knowledge concerning the domain of discourse of the scientific application can be con- ceptualized and used in order to steer the learning process (extraction of knowledge-metadata) from the underlying measurement and observation data (see Figure lb). At this point, two questions arise: zy   e What metadata should be captured and modeled z s knowledge? How can it be represented and organized o support data analysis and interpretation as well as he extraction and modeling of new knowledge? Based on a real application conceming a scientific ex- periment for the study of atmospheric upper troposphere- lower stratosphere) chemistry and phenomena, we will try to illustrate he issues of metadata as knowledge and its rep- resentation and organization. The remainder of this paper is organized as follows. In the next section, a short description of the scientific experiment under consideration is given, in conjunction with two scenarios conceming the validation 274 1051-9173/95 4.00 1995 IEEE  Fourteenth IEEE Symposium on Mass Storage Systems zyxw Data zyxwvutsrqp   zyxwvut   I Data I I Metadata Metadata zyxw Figure zyxwvuts . zyxwvutsrqp activities of scientific data for the extraction of scientific results. Based on these scenarios, a description of metadata in terms of knowledge structures is then given. Subsequently, the last section deals with the management and usage of metadata aiming at scientific datasets analysis andor ac- cess, and not only at data management (catalogs). Conse- quently, we describe a knowledge-based system that en- ables the organization of metadata zyxwvut s knowledge servers, as well as the connection of various knowledge elements (pieces of metadata), in order to constructjustification truc- tures about scientific hypotheses and/or conclusions. According to the diversity of scientific users and their needs, the main issues of the user interface under consid- eration (a metadata browsing and querying combination) is then presented wherein the main principles of object ori- entation, simplicity, and self-guiding have been taken into account. Finally, we briefly discuss those systems and ap- proaches that we think bear the most direct relationship to our approach, and then we present the conclusions and future work. The experiment and the scenarios The MIPAS (Michelson Interferometer for Passive Atmo- spheric Sounding) limb sounder is a remote sensing in- strument that is going to be installed on various platforms (balloon, aircraft, SatelliteENVISAT mission) to under- take atmospheric (upper tropospherelower stratosphere) data measurements ([2],[3]). Specifically, the understand- ing of complicated ozone-hole processes presupposes an understanding of the reactions and interactions of strato- spheric trace gases (for example, zyxwvuts 3 ClO,, NO,, HC1, Raw data delivered by the instrument will be trans- formed into observation data by applying calibration and trace gas-retrieval algorithms. Trace gas-related observa- tion data must be visualized and validated according to a given theory and corresponding background knowledge. The result of this validation could be a potential anomaly that is related to a subset of the source observation data or the discovery of unknown facts, which may revise the m03 ClONOZ). Information existing theory. Figure 2  outlines the general process of val- idating observed phenomena, and consequently, scientific laws. The following scenarios will illustrate his process. The scenarios The scenario of ozone depletion. We consider ozone (03) as a single concept in the domain of a scientific discipline. This concept is used tlo describe and address the ozone-related instances (observation data). Based on these instances, a significant trend of ozone concentra- tion through a certain period of time (for example, one year) can be provided. This trend must be validated by a given theory that ozone c n be depleted only if the de- pletion parameters have been changed. Therefore, a deple- tion trend that cannot be validated by a certain theory is a candidate for an anomaly. Recenitly 1985), the explana- tion based on this trend s source data of have led to the acceptance of ozone depletion processes and to a theory revision, through the introduction of heterogeneous chem- istry. The scenario of long-living gases. The group of trace gases N20, CFC12, CFCll is considered to be the taxonomy of long-living gases in the atmosphere. zy n insight must be given into their correlation quality. If they are not corre- lated, for instance, NzO, CFC12, according to the theory that dynamically dependent correlation of this trace gases group, then an anomaly has occurred. If this anomaly is not related to a subset of the underlying observation data, then the theory must be modified, which means that an un- known chemical process must be inserted into the theories. In parallel, the taxonomy of long-living gases must also be modified (in that CFClz is going to be deleted from the related taxonomy). Generally speaking, and for a better understanding of the knowledge (metadata) needed to analyze, interpret, and validate scientific data so as to proviide and justify scientific results, a flow chart depicts the general process in Fig- ure 2.  A parallelogram symbolizes the piece of knowledge or metadata that is used by the corresponding activities of learning (extracting metadata) with help of a priori de- 275  Fourteenth IEEE Symposium on Mass Storage Systems zyx Processes zyxwv   Taxonomy formation Law Formation zyxwvutsrq   Background Knowledge Validation Scientific revolution zyx   Mark exper. law zyxw s validated zyxwvu Figure zyxwvutsrq . fined knowledge (see also Figure lb), and consequently, of using this metadata for analysis and validation (see also Figure la). zyxwvutsr etadata as knowledge structures With respect to the scenarios given above, we will try to de- scribe the metadata in terms of scientific knowledge struc- tures. These definitions are closely related to those given in [4] e will also distinguish between a priori defined knowledge and knowledge that may be extracted through a learning process and formed as metadata for data analysis and validation. Measurements and observations. They are objects ad- dressing the relevant instances of measured (or transformed into) observation data, for example, interferograms, cali- brated spectra, trace gas distributions. These objects can be dessribed by atomic or complex concepts. They enrich the semantics of the underlying scientific data and are consid- ered to be equivalent to the database conceptual design of an information system (Figure 3). This knowledge is a pri- ori definable and strongly related to the application domain (scientific experiment) under consideration. Transformation processes. They are classes of objects representing atomic or complex concepts addressing the in- stances of transformation processes. They can take values by interactively changed parameters, versioning of spec- ification and implementation of algorithms, and so forth, throughout the transformation processing chain of mea- surement data towards observations. 276  Fourteenth IEEE Symposium on Mass Storage Systems zyxw SCIENTIFIC MPERIMENT zyxwvutsrqpo   REQuIRE Ts ANALYSIS APPLICATION DESIGN SPECIFICATION zyxwvutsrqpon   SOFTWARE i CONCEPTUAL ; DESIGN (dah and pro- models) zyxwvutsrqponml   I LOGICAL DESIGN PHYSICAL DESIGN IMPLEMENTATlON VALIDATION J zyxwvutsrqponmlkjihg Figure 3 zyxwvutsrq The concepts of transformation processes enrich the semantics of processes and implemented algorithms, and are considered to be equivalent to the design issues of the software applications zyxwvuts as being addressed during the de- velopment of an information system). This knowledge is a priori definable and strongly related to the application domain (scientific experiment) under consideration. Generation histories. They constitute the behavioral model of the information system in terms of event- condition-action triples. They are expressed by network concepts that are related to the data-derivation history of the scientific data instances [5]. Based upon these network concepts, observations or measurements from which other observations have been derived zyxwvuts an be traced back. This knowledge is a priori definable and strongly related to the application domain (scientific experiment) under consider- ation. Generation histories interrelate transformation pro- cesses with measurements and observations, in that the latter are considered to be the inputs and outputs of the transformation processes. Bringing these two knowledge structures data and process models of the information sys- tem) into conjunction, we will be able to gain insight into the conditions under which observations have been trans- formed and/or generated. Taxonomies. They are links used to organize concepts into a hierarchy or some other partial orldering [61. We must dis- tinguish here between taxonomies and concepts, in that the notion of a concept is primarily the notion of a data struc- ture. Taxonomies are considered to be storing information at appropriate levels of generality ,and automatically mak- ing it available to more specific concepts by means of a mechanism of inheritance. From another perspective, taxonomies provide the in- formation to steer the extraction of new knowledge hrough learning methods. Considering the taxonomies that deal with the concepts of the scientific (discipline for example, trace gases), a connection point Cim be specified through the generation history between the concepts of the scien- tific experiment and discipline. This knowledge is partly a priori definable and is related to the application domain (scientific experiment and discipline) under consideration. Experimental laws. They summarize relations among ob- served variables (for example, NO,, temperature, pressure), or atomic objects (for example, trend of ozone concentra- tion). They can be in qualitative oc quantitative form and must be inductively inferenced according to the underly- ing data This knowledge cannot be defined a priori. It can be regarded mainly as metadata to validate scientific data (extraction of infomation in terms of correct or falsified observations). Theoriedhypotheses. They represent scientific hypothe- ses about chemical processes in the atmosphere. They dif- fer from experimental laws in malcing reference to unob- servable objects or mechanisms. Scientific hypotheses are statements that belong to the empiuical sciences and have this status if and only if they are falsifiable [71. These state- ments are falsifiable f and only if there exists at least one potential falsifier. Thus, a logical relation exists between the scientific hypothesis and the class of potentid falsifiers. This knowledge can be partly defined a priori. Anomalies. They are experimental laws marked as poten- tial falsifiers of a theory/hypothesis. It will be the output of a validation process of an experimental law and could demonstrably falsify a scientific hypothesis. This knowl- edge cannot be defined a priori. Background knowledge. This is ai set of beliefs or knowl- edge about the environment, aside from those that are specifically under study. It differs fkom theories/hypotheses or experimental laws in that the scientist holds background knowledge with relative certainty, rather than as the sub- ject of active evaluation. Auxiliary datasets, like climatol- ogy or spectroscopic data (spectral1 lines of already known molecules) and/or data from other contemporary experi- 277
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!