Health & Lifestyle

6 pages

The Effect of Imbalanced Data Class Distribution on Fuzzy Classifiers - Experimental Study

Please download to get full document.

View again

of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
The Effect of Imbalanced Data Class Distribution on Fuzzy Classifiers - Experimental Study
  The Effect of Imbalanced Data Class Distributionon Fuzzy Classifiers - Experimental Study Sofia Visa Department of ECECS,University of Cincinnati,Cincinnati, OH 45221-0030, Anca Ralescu Department of ECECS,University of Cincinnati,Cincinnati, OH 45221-0030,  Abstract —This study evaluates the robustness of a fuzzyclassifier when class distribution of the training set varies. Theanalysis of the results is based on the classification accuracy andROC curves. The experimental results reported here show thatfuzzy classifiers are less variant with the class distribution andless sensitive to the imbalance factor than decision trees. I. I NTRODUCTION In order to evaluate correctly the performance of a givenclassification method on real data sets, information such as theerror costs and the underlying class distribution are required[1], [2]. For learning with imbalanced class distributions - thatis, for a two-class classification problem, the training data forone class (majority or negative class) greatly outnumbers thetraining data for the other class (minority or positive class) -such information is crucial and yet, many times not available.Since standard methods of classification are driven by theminimization of the overall accuracy, without considering(or knowing) error costs of the two classes (minority andmajority), they are not suitable for imbalanced data sets. Acommon practice for dealing with this problem is to rebalanceclasses artificially, either by up-sampling or down-sampling.As suggested in [2], up-sampling does not add informationwhile down-sampling actually removes information. Consid-ering this fact, the best research strategy is to concentrate onhow machine learning algorithms can deal most effectivelywith whatever data they are given. Fuzzy classifiers, [3] and[4], derived from class frequency distributions proved effectivein classifying imbalanced data sets.II. C LASS  D ISTRIBUTION IN THE  L EARNING  P ROCESS In this experiment the role of class distribution in learninga fuzzy classifier from imbalanced data is investigated. Asimilar experiment was published in [5] using decision trees.The performance of the fuzzy classifier for multidimensionaldata is evaluated on five real data sets and compared withthe results published in [5]. This study emerged from the factthat,  there is no guarantee that the data available for trainingrepresent (capture) the distribution of the test data . Therefore,reduced variance of classifiers output over different trainingclass distributions is a very important feature of a classifier. TABLE IS TATISTICS ABOUT THE REAL DATA SETS . S ECOND COLUMN SHOWS THENATURAL DISTRIBUTION OF THE DATA SETS AS THE MINORITY CLASSPERCENTAGE OF THE WHOLE DATA SET .Name Minority of Size Train Testclass features size sizeletter-aoptDigitsletter-vowelgermanwisconsin  A. The Data Sets Table I shows characteristics of the five UCI Repositorydomains used in this study. In the second column of the TableI are listed the natural class distributions of the data setsexpressed in this paper as the minority class percentage of the whole data set.The letter-a/letter-vowel data set was obtained from theletter data set as follows: instances of letter ’a’/of vowelsrepresent the minority class and the remaining letters, themajority class. For the optDigits data set, the minority classis represented by the digit and the remaining digits ( - )represent the majority class. The wisconsin and german datasets are two-class domains: cancer versus non-cancer patientsand good versus bad credit history of persons asking loans,respectively.  B. Altering the Class Distribution To study experimentally how the class distribution affectsthe fuzzy classifier in learning the real domains, the distribu-tion of the training set is varied and the classifier is evaluated,for each distribution, on the same test data (see a similar studyin [5] using C4.5).The test data set reflects the natural distribution and it isobtained by selecting randomly of examples from eachclass (for example, for the letter-a data set, a testing set of points is obtained: minority instances andmajority instances). By are denoted the remaining minorityexamples and by the remaining majority examples. In order 0-7803-9158-6/05/$20.00 © 2005 IEEE. The 2005 IEEE International Conference on Fuzzy Systems 749  to compare the performanceof different classifiers obtained fordifferent class distributions, the same test data is used.The training set size( ) is equal to the(number of minority examples left after forming thetest data - that is , for the letter-a data set). The trainingset is altered to obtain different class distribution, as follows:for class distribution, randomminority points are selected from andrandomly selected majority pointsfrom , where is , , , , , and thenatural distribution (listed in the second column of the TableI).III. T HE  F UZZY  C LASSIFIER The main problem in designing a fuzzy classifier is toconstruct the fuzzy sets, more precisely their membershipfunctions. Approaches to construct fuzzy classifiers rangefrom quite ad-hoc to more formal approaches, in which themembership function is constructed directly from data withoutany intervention of the designer. The current approach relieson the interpretation of a fuzzy set as a family of probabilitydistributions and therefore, a particular membership functionis the result of selecting one of the probability distributions inthis family. The mechanism of deriving a fuzzy set member-ship function makes use of mass assignment theory(MAT) [6]and is presented shortly next (for in depth presentation, pleasesee [7], [8] and [4]).Given a collection of data, and the relative frequency dis-tribution corresponding to it,, the correspondingfuzzy set is obtained from the Equation 1:(1)where denotes the th largest value of the membershipfunction corresponding to the general,  lpd(least prejudiced selection rule)  selection rule [6].Example 1 illustrates the complete mechanism of convertinga simple artificial data set into a fuzzy classifier, correspondingto the selection rule [9].  Example 1:  Let and denote respectively themajority and minority classes given as:Their relative frequency distributions (in nonincreasing or-der) corresponding to are:The membership values for each fuzzy set are computed (indecreasing order of the relative distributions) as shown in Table TABLE II FOR THE AND CLASSES OF EXAMPLE  1. x1 x2 x3 x4 x5 x60.50.550.60.650.70.750.80.850.90.951Data    M  e  m   b  e  r  s   h   i  p   d  e  g  r  e  e   Maj Fuzzy Set(lpd)Min Fuzzy Set(lpd) Fig. 1. The fuzzy sets obtained for the majority (left) and the minority (right)class using selection rule. II. The obtained fuzzy sets (each class is mapped into a fuzzyset) are displayed in the Figure 1.For a test data point, the membership degree to each of these fuzzy sets are computed and compared: the point isassigned to the class to which it belongs with a higher degree.For example, the derived fuzzy classifier classifies the data asfollows: belong to class andbelong to class.Example 1 illustrates for one-dimensional data set the basicone-pass fuzzy classifier used in this study. In principle, formultidimensional data sets the approach outlined above canbe applied as well. However, it should be noticed that as thedimensionality increases the data set becomes sparse, and thatthere may be very few data points with frequency greaterthan 1. Otherwise stated, this means that in order to obtainmeaningful frequencies, either the data set size must increasewith each new dimension, or for a given data set, preprocess itby collecting data into bins and apply the approach describedto bins. The bin approach is apt to introduce errors, whileincreasing the data set size is not always possible (in fact, The 2005 IEEE International Conference on Fuzzy Systems 750  rarely is possible).In any case, regardless of the approach used, anotherproblem that arises is that of interpolation for computingthe membership degree to unlabeled data points. Havingmultidimensional fuzzy sets makes this step more complex.The approach currently taken in this study is to derivefuzzy sets along each dimension, in effect, deriving as manyclassifiers as the dimension of the attribute space and to aggregate  these classifiers in order to evaluate a data point.Several aggregation operators are proposed here but otheraggregation methods such as the ones presented in [10] canbe used too. The following notations are used in defining theaggregation methods ( , ):denotes the class label of a test point ;with is the indicatorfunction;for is a set of weight characterizingthe attributes ( is the number of correctly classified trainingdata by the attribute).Then, the aggregations are defined as follows:1) : .2) : .3) : .4) : .Based on the , , the class label of is decidedby evaluatingfor .But first, it is interesting to understand why one mayexpect a good performance from the fuzzy classifier applied toimbalance data. As it can be observed from Figure 1, willbe assigned as belonging to the minority class since its degreeto this class is and the membership to the majority classis . Looking at the original data shows that ’s frequencyin the minority class is while in the majority class it is .Any classifier in which is learned based on its contributionto a class relative to the whole data set, will assign to themajority class.Classifiers such as the fuzzy classifier used in this study,which learn the classification based on the relative frequencywithin the class will assign to the minority class, where itsrelative frequency of is greater than its relative frequencyof in the majority class. Otherwise stated,  within theclass-size context, the point is more representative for theminority class than for the majority class. This idea is captured by the fuzzy classifier and makes it suitable for imbalanced data sets .IV. P ERFORMANCE  E VALUATION When learning classes, even for balanced data sets, forwhich the errors coming from different classes have differentcosts, the overall accuracy is not a good measure of theclassifier performance. Even more, when the class distributionis highly imbalanced, the accuracy is biased to favor the TABLE IIIT HE CONFUSION MATRIX .PredictedNegative PositiveActual NegativePositive majority class and does not value rare cases as much ascommon cases. Therefore, it is more appropriate to use asperformance evaluation measure the ROC (Receiving OperatorCharacteristic) curves. The ROC curves provide a visualrepresentation of the trade-off between true positives (TP) andfalse positives (FP) as expressed in the Equations 2 and 3. The confusion matrix   shown in Table III contains informationaboutactual and predicted classification done by a classificationsystem.(2)(3)However, for the purpose of comparing the results of this study with results published in [5], accuracy is alsoused as a measure to evaluate a classifier, in addition of theROC curves. The fuzzy sets obtained with the procedureindicated previously in this paper are discrete fuzzy sets.However, their evaluation is required on unseen points.The standard approach to this problem is to extend thediscrete fuzzy set to a continuous version by  piecewiselinear interpolation . More precisely, if denotes a datapoint, and a fuzzy set with membership , with support, then themembership degree of to is given by otherwise(4) V. R ESULTS AND  A NALYSIS OF THE  S TUDY All the results reported in this study are averaged over 30runs and the test data reflect the natural distributions of thedomains.Figures 2 - 6 show the overall error percentage whendifferent training class distributions are used. andoutperform decision trees in four of the five domains studiedhere. For letter-vowel domain, and give less error onlyfor class distributions higher than (Figure 4). In Figures7 - 11 are plotted the ROC curves of the four fuzzy classifiers,obtained for various class distributions. For all the five datasets ’s ROC curve is dominant: it is above the other ROCcurves and it is closer to the y axis. The 2005 IEEE International Conference on Fuzzy Systems 751  0 10 20 30 40 50 60 70 80 90 100051015202530Training distribution (percentage of minority class).    E  r  r  o  r   (  p  e  r  c  e  n   t  a  g  e   ) . C4.5(Weiss)D1D2D3D4 Fig. 2. Letter-a: the error in classification over various degrees of classdistributions. Natural distribution is . 0 10 20 30 40 50 60 70 80 90 1000102030405060Training distribution (percentage of minority class).    E  r  r  o  r   (  p  e  r  c  e  n   t  a  g  e   ) . C4.5(Weiss)D1D2D3D4 Fig. 3. OptDigits: the error in classification over various degrees of classdistributions. Natural distribution is . 0 10 20 30 40 50 60 70 80 90 1001015202530354045Training distribution (percentage of minority class).    E  r  r  o  r   (  p  e  r  c  e  n   t  a  g  e   ) . C4.5(Weiss)D1D2D3D4 Fig. 4. Letter-vowel: the error in classification over various degrees of classdistributions. Natural distribution is . 0 10 20 30 40 50 60 70 80 90 10025303540455055606570Training distribution (percentage of minority class).    E  r  r  o  r   (  p  e  r  c  e  n   t  a  g  e   ) . C4.5(Weiss)D1D2D3D4 Fig. 5. German: the error in classification over various degrees of classdistributions. Natural distribution is . 0 10 20 30 40 50 60 70 80 90 100246810121416182022Training distribution (percentage of minority class).    E  r  r  o  r   (  p  e  r  c  e  n   t  a  g  e   ) . C4.5(Weiss)D1D2D3D4 Fig. 6. Wisconsin: the error in classification over various degrees of classdistributions. Natural distribution is . For the german data set, the trade-off between FP and TPis obvious (Figure 10): training with more Min examples in-troduces more false positives. The combination of two factorscontributes to this behavior:1) attributes (out of ) have exactly the same range of values for the Min and Maj classes (complete overlap)and the remaining three attributes overlap partially;2) the natural class distribution (present in the test data) is.Therefore, when the classifier is trained with many Minexamples, the recognition of the Min class (which makesof the test set) improves, but at the cost of misclassifying muchmore Maj points, since the Maj class is present in testing withof data. The analysis of Figure 5 (where the plain erroris reported) leads to the same conclusion.The letter-a domain presents naturally more imbalance( ) than the letter-vowel domain ( ), though surpris-ingly, letter-a is better recognized (see Figures 2 and 4). This ismainly due to the fact that, the Min class for letter-a (instancesof letter a) is better defined, as a concept, than the Min class for The 2005 IEEE International Conference on Fuzzy Systems 752  0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 positives.    T  r  u  e  p  o  s   i   t   i  v  e  s . D1D2D3D4 Fig. 7. Letter-a: the ROC curves obtained for the various class distributions.Natural distribution is . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 positives.    T  r  u  e  p  o  s   i   t   i  v  e  s . D1D2D3D4 Fig. 8. OptDigits: the error in classification over various degrees of classdistributions. Natural distribution is . letter-vowel (instances of a, e, i, o, u). In the same idea, thereis more overlap between the classes in letter-vowel set thanin the letter-a data set: letter-vowel domain has two attributescompletely overlapped and in other attributes (out of )has more overlap than the letter-a data set. The ROC curvesare also consistent with the previous observation: they showindeed, a better (tighter) clustering of the letter-a Min class(Figure 7) than the letter-vowel (Figure 9).Figure 3 shows that fuzzy classifier performs well inrecognizing both the Min and Maj class for the optDigitdomain. This domain has attributes (of which, attributestotally overlap) and a natural imbalance of . A highererror when the training class distribution is , is due to thefact that the Min class is not learned well and mainly Minclass contributes to the error (for , a ROC point on the yaxis at ). The increase in error for the class distributionof is due to the fact that Maj class is under-representedin training and this time Maj class has a higher error rate.Though, the number of false positives does not grow much(Figure 8: for , the ROC point is ). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 positives.    T  r  u  e  p  o  s   i   t   i  v  e  s . D1D2D3D4 Fig. 9. Letter-vowel: the ROC curves obtained for the various classdistributions. Natural distribution is . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 positives.    T  r  u  e  p  o  s   i   t   i  v  e  s . D1D2D3D4 Fig. 10. German: the ROC curves obtained for the various class distributions.Natural distribution is . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 positives.    T  r  u  e  p  o  s   i   t   i  v  e  s . D1D2D3D4 Fig. 11. Wisconsin: the ROC curves obtained for the various classdistributions. Natural distribution is . The 2005 IEEE International Conference on Fuzzy Systems 753
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!