8 pages

A Knowledge-driven Data Warehouse Model for Analysis Evolution

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
A Knowledge-driven Data Warehouse Model for Analysis Evolution
  A Knowledge-driven Data WarehouseModel for Analysis Evolution Cécile FAVRE   , Fadila BENTAYEB and Omar BOUSSAID  ERIC Laboratory, University of Lyon 2, France Abstract. A data warehouse is built by collecting data from external sources. Sev-eral changes on contents and structures can usually happen on these sources. There-fore, these changes have to be reflected in the data warehouse using schema updat-ing or versioning. However a data warehouse has also to evolve according to newusers’ analysis needs. In this case, the evolution is rather driven by knowledge thanby data. In this paper, we propose a Rule-based Data Warehouse (  R-DW  ) model,in which rules enable the integration of users’ knowledge in the data warehouse.The R-DW  model is composed of two parts: one fixed part that contains a fact tablerelated to its first level dimensions, and a second evolving part, defined by meansof rules. These rules are used to dynamically create dimension hierarchies, allow-ing the analysis contexts evolution, according to an automatic and concurrent way.Our proposal provides flexibility to data warehouse’s evolution by increasing users’interaction with the decision support system. Keywords. Data Warehouse, Schema Evolution, Knowledge, Rule Introduction The design of integrated concurrent engineering platforms has received much attention,because competing firms strives for shorter design delays and lower costs. Concurrentengineering allows for parallel design, thus leads to shorter design to market delays.It however requires advanced coordination and integration capabilities [4]. Concurrentengineering, also known as simultaneous engineering, is a non-linear product or projectdesign approach during which all phases operate at the same time.Data warehouse systems must operate in a fully concurrent environment. Material-ized views must be maintained within the data warehouse dynamicallywith the data pro-vided from data sources. Data sources can generate concurrent data updates and schemachanges within the data warehouse. However, we think that data warehousing is a tech-nology that is difficult to consider from a concurrent engineering point of view concern-ing the users’ implication in the process. The data warehouse design corresponds to alinear process in which users are indirectly implied. First, a study of the data sourcesand the users’ analysis needs is needed. Second, the model of the data warehouse has tobe built. Last the ETL (Extract Transform and Loading) process has to be defined andimplemented. End users make analyses entirely driven by the data warehouse model. Byconsidering a concurrent engineering approach for users in the data warehouse, we needto cope evolutions of analysis possibilities defined by users. 1 Corresponding Author: Cécile Favre, ERIC Laboratory - University of Lyon 2, 5 av. Pierre Mendès-France,69676 Bron Cedex, France; E-mail:  Figure 1. Data warehouse model for the NBI analysis. It is impossible to take into account the evolution of all users’ analysis needs, par-ticularly when these needs are based on their knowledge. Indeed, sometimes users haveknowledge which is not represented in the data warehouse but which is likely to be usedforananalysis. To enableevolutionof datawarehouse,the administratorhas to centralizethe analysis needsandknowledgeonwhich theyarebased.This process is a difficulttask to achieve. Thus, we propose a new data warehouse model based on rules, named R-DW  (  Rule-based Data Warehouse ), in which rules integrate users’ knowledge allowing realtime evolution of the analysis possibilities.Our R-DW  model is composed of two parts: a “fixed” part, defined extensionally,and an “evolving” part, defined intentionally with rules. The fixed part includes a facttable and dimensions of the first level (dimensions which have a direct link with thefact table). The evolving part includes rules which define new analysis axes based onexisting dimensions and on new knowledge. These rules create new granularity levels indimension hierarchies. Thus this model makes the evolution of analysis needs expresseddirectly by users easier, without the administrator’s intervention.The remainder of this paper is organized as follows. First, we present a motivatingexample in Section 1. Section 2 and Section 3 are devoted to the principles of our R- DW  model and its formal framework. We also present an implementation and a runningexample with banking data in Section 4. Then we discuss in Section 5 the related work regardingschema evolution and flexibility offered by rule-based languages in data ware-houses. We finally conclude this paper and discuss research perspectives in Section 6. 1. Motivating example To illustrate our approach throughout this paper, we use a case study defined by LeCrédit Lyonnaisfrenchbank (LCL 2 ). The annual Net BankingIncome(NBI) is the profitobtained from the management of customers account. It is a measure that is studiedaccording to dimensions customer (marital status, age...), agency and year (Figure 1).Let us take the case of the students portfolio manager of LCL. This person knowsthat a few agencies manage only students accounts. But this knowledge is not visible inthe model. It therefore cannot be used to carry out an analysis about student dedicatedagencies. This knowledgerepresentsa way to aggregatedata, underthe formof “if-then”rules, as the following rules which represent the knowledge on the agencies type: 2 Collaboration with the Management of Rhône-Alpes Auvergne Exploitation of LCL-Le Crédit Lyonnaiswithin the framework of an Industrial Convention of Formation by Research (CIFRE)  Figure 2. Rule-based Data Warehouse conceptual model for the NBI analysis. Figure 3. R-DW  model. ¢¡¤£¦¥¨§©"! _ #$&%(' ‘01903’,‘01905’,‘02256’ )102 435§76 _ 8"! _ 0!¦9@BA ‘student’ ¢¡CD¥¨§©"! _ #$FE%(' ‘01903’,‘01905’,‘02256’ )102 435§76 _ 8"! _ 0!¦9@BA ‘classical’ The objective is then to carry out new analysis based on the user’s knowledge. Foran example,theobjectiveis tobuild aggregates,byconsideringthat the facts toaggregateconcern a student agency (R1), or a classical agency (R2). Our objective is thus to inte-gratethese rulesinto themodel.Indeed,rules allowus to buildthe level AGENCY_TYPE ,by defining the values of the dim_agency_type attribute in the dimension hierarchy(Figure 2). To achieve this objective, we propose the R-DW  model. 2. The R-DW model The R-DW  model is composed of two parts: one fixed part, defined extensionally, andone evolving part, defined intentionally with rules (Figure 3). The fixed part can be seenas a star schema because it is composed of a fact table and first level dimensions. Theevolving part is composed of rules which generate new granularity levels in dimensionhierarchies based on users’ knowledge and existing dimensions.Our model providesto the users a way to express their rules to define dimensionshi-erarchies. It presents many advantages comparing to existing models by allowing: (1) todynamically create hierarchies, (2)to make analysis on evolving contexts, (3) to increasethe interaction between the user and the information system since the user can integratehis own knowledge.In the R-DW  model, rules, namely aggregation rules, are used to create dimensionhierarchies by defining the aggregation link between two granularity levels in a dimen-sion hierarchy.These rules are definedby users who expresstheir knowledge.Theaggre-gation rules are “if-then” rules. These rules have the advantage of being very intelligible  for users since they model informationexplicitly. The then-clause contains the definitionof a higher granularity level. The if-clause contains conditions on the lower granular-ity levels. The following rules define the granularity level AGENCY_TYPE through the dim_agency_type attribute, basing on the AGENCY level ( Agency_ID attribute): ¢¡¤£¦¥¨§©"! _ #$%' ‘01903’,‘01905’,‘02256’ )102 435§76 _ 8"! _ 0!¦9@A ‘student’ ¢¡CD¥¨§©"! _ #$E%' ‘01903’,‘01905’,‘02256’ )102 435§76 _ 8"! _ 0!¦9@A ‘classical’ The rules thus make it possible to integrate knowledge to define the various granu-larity levels of the dimension hierarchies. The advantage in building the dimension hi-erarchies with rules is that we are able to take into account the users’ knowledge in realtime. Therefore, the data warehouse model becomes more flexible for analysis. Indeedthe users can analyze data according to the new granularity levels defined by their rules.To take into account the knowledge or the analysis needs of different users simulta-neously,we have to deal with “versions” of rules when the users define the same analysisneeds by different ways. Let us take the example of age groups definition starting fromthe ages of the table CUSTOMER . The following classes can be defined by two users:User 1 User 2 §©  ¢¡¤£ 02 435§76 _ 8BA ‘less than 60 years old’ §©   £ ¦¥ 02 35§76 _ 8A ‘minor’ §© ¨§¡¤£ 02 435§76 _ 8BA ‘more than 60 years old’ §© ¨§ £ ¦¥ 02 35§76 _ 8A ‘major’ 3. Formal framework We represent the Rule-based Data Warehouse model R-DW  by the following triplet:  R-DW  ©¦! where  is the fixed part,  the evolving part et  the universe of the data warehouse  R-DW  . Definition 1. Universe of the data warehouse The universe of the data warehouse  is a set of attributes, such as: "©$#&%   ('('('()%102('('('3)%54687   ('('('(879@('('('A1©B#&%10C(D!EGFHEPIQASRT#U792¦VXWYD`A where #&%102(DaEbF"EbIQA is the set of  I predefined attributes (in the fixed part  ) and #U79@¦VcW$D`A is the set of generated attributes (defined in the evolving part  ). Definition 2. Fixed part of R-DW  The fixed part of R-DW  is represented by: b©edPfgihqp where f is a fact table and hr©s#&t¢u((DvExwvEy8A is the set of  y first level dimen-sions which have a direct link with fact table f . We assume that these dimensions areindependent.  Example2. IntheFigure1, b© < f _ T%e ,{ !71 ,  , 7!5!B }>is the fixed part of the R-DW  for the NBI analysis.Theexpressionofnewanalysisneedsinducesthedefinitionofnewgranularitylevelsin dimension hierarchies. Definition 3. Dimension hierarchy and granularity level Let R-DW  ©idPfgihqp!¦! be a data warehouse.  Let tu' ¢¡  ¤£ W$D be a dimension hierarchy t¢u ¦¥ h .Thedimensionhierarchy t¢u' ¢¡ is composedofaset of  § orderedgranularitylevelsnoted ¨© : tu'    ¢¡ ©B# ¨    ¨ `('('('3 ¨© ¦('('('3 ¨ S § "W$D`A where ¨   ¨ '('(' ¨© '('(' ¨ .The granularity level ¨© of the hierarchy    ¡ of dimension t¢u is noted tu' ¢¡ ' ¨© or ¨ u ¡© . The granularity levels are defined with attributes called generated attributes. Definition 4. Generated attribute An attribute 79 ¥ ¦VXWYD , is called generated attribute . 79 characterizes a granularity level in a dimension hierarchy. To simplify, we sup-pose that each granularity level of dimension hierarchy is represented by only one gen-erated attribute, even if it is possible to generate more than one attribute per level.Thus, the generated attribute 7S9 which characterized the granularity level ¨!© of hierarchy    ¢¡ of dimension t¢u is noted ¨ u ¡© ' . The values of these generated attributesare defined by using the evolving part of  R-DW  . Definition 5. Evolving part of R-DW  The evolving part of R-DW  is represented by ©B#6d #"$©&%  ¨ u ¡© '"p1A where "$©&% ©# ('(©) 6(D¢E 10 1E 1§ (DE 32  E 14 A is a set of  4 aggregation rules defining thevalues of the generated attribute ¨ u ¡© ' .  represents the set of  § granularity levels of hierarchy    ¢¡ of dimension t¢u and their associated rules. Definition 6. Aggregation rule An aggregation rule defines the aggregation link which exists between two granu-larity levels in a dimension hierarchy. It is based on a set 5 of  6 rule terms noted 5 87 ,such as: 5 ©$#&5 97 (D!E A@ aE B6 A5©B#` DC¤@ #Uw (E y GF4IHIP AA where  is an attribute of the universe  ; C¤@ is a relational operator ( © , d , p , E , W , Q © ,...), or an ensemblist operator ( ¥ , R¥ , ...) ; w (E y is a set of values and 4IHIP is a given value.  Example 6a. 5   ¦S  UTVE(6XW`Y _ 6t a¥ # ‘01903’,‘01905’,‘02256’ A5 8S  bEHc' _ 6td Ddfege QD5 8hS  iE(6XpcE(' !© ‘F’An aggregation rule is an “ if-then ” rule. The conclusion of the rule (“then” clause)defines the value of the generated attribute. The premise of the rule (“if” clause) is basedon a composition of conjonctions or disjonctions of these rule terms: '(©)S0rq 5    sHc6XptFC' `5 8 ''' sHc6XptFC' `5 8u 1y v9E(6¨ u ¡© 'Y© w4IHIP  Example6b. Thefollowingrulesdefinethevaluesoftheattribute dim_type_agency which characterizes the granularity level AGENCY_TYPE : ¢¡¤£¦¥§©"! _ #$%' ‘01903’,‘01905’,‘02256’ )102 435§76 _ 8"! _ 0!¦9@A ‘student’ ¢¡CD¥§©"! _ #$FE%' ‘01903’,‘01905’,‘02256’ )102 435§76 _ 8"! _ 0!¦9@A ‘classical’  Example 6c. The following rule defines the value ‘married women’ of the at-tribute dim_persons_group according to attributes Marital_Status and Gen-der of  CUSTOMER table: §© x 8 y §08 G _  08D0 c A ‘Married’ 83  B3D y A ‘F’ 02 435§76 _ 9@ y   _  y` 9A ‘married women’
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!