iDocSlide.Com

Free Online Documents. Like!

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Description

The Effect of Imbalanced Data Class Distribution on Fuzzy Classifiers - Experimental Study

Transcript

The Effect of Imbalanced Data Class Distributionon Fuzzy Classiﬁers - Experimental Study
Soﬁa Visa
Department of ECECS,University of Cincinnati,Cincinnati, OH 45221-0030, USAsvisa@ececs.uc.edu
Anca Ralescu
Department of ECECS,University of Cincinnati,Cincinnati, OH 45221-0030, USAaralescu@ececs.uc.edu
Abstract
—This study evaluates the robustness of a fuzzyclassiﬁer when class distribution of the training set varies. Theanalysis of the results is based on the classiﬁcation accuracy andROC curves. The experimental results reported here show thatfuzzy classiﬁers are less variant with the class distribution andless sensitive to the imbalance factor than decision trees.
I. I
NTRODUCTION
In order to evaluate correctly the performance of a givenclassiﬁcation method on real data sets, information such as theerror costs and the underlying class distribution are required[1], [2]. For learning with imbalanced class distributions - thatis, for a two-class classiﬁcation problem, the training data forone class (majority or negative class) greatly outnumbers thetraining data for the other class (minority or positive class) -such information is crucial and yet, many times not available.Since standard methods of classiﬁcation are driven by theminimization of the overall accuracy, without considering(or knowing) error costs of the two classes (minority andmajority), they are not suitable for imbalanced data sets. Acommon practice for dealing with this problem is to rebalanceclasses artiﬁcially, either by up-sampling or down-sampling.As suggested in [2], up-sampling does not add informationwhile down-sampling actually removes information. Consid-ering this fact, the best research strategy is to concentrate onhow machine learning algorithms can deal most effectivelywith whatever data they are given. Fuzzy classiﬁers, [3] and[4], derived from class frequency distributions proved effectivein classifying imbalanced data sets.II. C
LASS
D
ISTRIBUTION IN THE
L
EARNING
P
ROCESS
In this experiment the role of class distribution in learninga fuzzy classiﬁer from imbalanced data is investigated. Asimilar experiment was published in [5] using decision trees.The performance of the fuzzy classiﬁer for multidimensionaldata is evaluated on ﬁve real data sets and compared withthe results published in [5]. This study emerged from the factthat,
there is no guarantee that the data available for trainingrepresent (capture) the distribution of the test data
. Therefore,reduced variance of classiﬁers output over different trainingclass distributions is a very important feature of a classiﬁer.
TABLE IS
TATISTICS ABOUT THE REAL DATA SETS
. S
ECOND COLUMN SHOWS THENATURAL DISTRIBUTION OF THE DATA SETS AS THE MINORITY CLASSPERCENTAGE OF THE WHOLE DATA SET
.Name Minority of Size Train Testclass features size sizeletter-aoptDigitsletter-vowelgermanwisconsin
A. The Data Sets
Table I shows characteristics of the ﬁve UCI Repositorydomains used in this study. In the second column of the TableI are listed the natural class distributions of the data setsexpressed in this paper as the minority class percentage of the whole data set.The letter-a/letter-vowel data set was obtained from theletter data set as follows: instances of letter ’a’/of vowelsrepresent the minority class and the remaining letters, themajority class. For the optDigits data set, the minority classis represented by the digit and the remaining digits ( - )represent the majority class. The wisconsin and german datasets are two-class domains: cancer versus non-cancer patientsand good versus bad credit history of persons asking loans,respectively.
B. Altering the Class Distribution
To study experimentally how the class distribution affectsthe fuzzy classiﬁer in learning the real domains, the distribu-tion of the training set is varied and the classiﬁer is evaluated,for each distribution, on the same test data (see a similar studyin [5] using C4.5).The test data set reﬂects the natural distribution and it isobtained by selecting randomly of examples from eachclass (for example, for the letter-a data set, a testing set of points is obtained: minority instances andmajority instances). By are denoted the remaining minorityexamples and by the remaining majority examples. In order
0-7803-9158-6/05/$20.00 © 2005 IEEE.
The 2005 IEEE International Conference on Fuzzy Systems
749
to compare the performanceof different classiﬁers obtained fordifferent class distributions, the same test data is used.The training set size( ) is equal to the(number of minority examples left after forming thetest data - that is , for the letter-a data set). The trainingset is altered to obtain different class distribution, as follows:for class distribution, randomminority points are selected from andrandomly selected majority pointsfrom , where is , , , , , and thenatural distribution (listed in the second column of the TableI).III. T
HE
F
UZZY
C
LASSIFIER
The main problem in designing a fuzzy classiﬁer is toconstruct the fuzzy sets, more precisely their membershipfunctions. Approaches to construct fuzzy classiﬁers rangefrom quite ad-hoc to more formal approaches, in which themembership function is constructed directly from data withoutany intervention of the designer. The current approach relieson the interpretation of a fuzzy set as a family of probabilitydistributions and therefore, a particular membership functionis the result of selecting one of the probability distributions inthis family. The mechanism of deriving a fuzzy set member-ship function makes use of mass assignment theory(MAT) [6]and is presented shortly next (for in depth presentation, pleasesee [7], [8] and [4]).Given a collection of data, and the relative frequency dis-tribution corresponding to it,, the correspondingfuzzy set is obtained from the Equation 1:(1)where denotes the th largest value of the membershipfunction corresponding to the general,
lpd(least prejudiced selection rule)
selection rule [6].Example 1 illustrates the complete mechanism of convertinga simple artiﬁcial data set into a fuzzy classiﬁer, correspondingto the selection rule [9].
Example 1:
Let and denote respectively themajority and minority classes given as:Their relative frequency distributions (in nonincreasing or-der) corresponding to are:The membership values for each fuzzy set are computed (indecreasing order of the relative distributions) as shown in Table
TABLE II
FOR THE AND CLASSES OF EXAMPLE
1.
x1 x2 x3 x4 x5 x60.50.550.60.650.70.750.80.850.90.951Data
M e m b e r s h i p d e g r e e
Maj Fuzzy Set(lpd)Min Fuzzy Set(lpd)
Fig. 1. The fuzzy sets obtained for the majority (left) and the minority (right)class using selection rule.
II. The obtained fuzzy sets (each class is mapped into a fuzzyset) are displayed in the Figure 1.For a test data point, the membership degree to each of these fuzzy sets are computed and compared: the point isassigned to the class to which it belongs with a higher degree.For example, the derived fuzzy classiﬁer classiﬁes the data asfollows: belong to class andbelong to class.Example 1 illustrates for one-dimensional data set the basicone-pass fuzzy classiﬁer used in this study. In principle, formultidimensional data sets the approach outlined above canbe applied as well. However, it should be noticed that as thedimensionality increases the data set becomes sparse, and thatthere may be very few data points with frequency greaterthan 1. Otherwise stated, this means that in order to obtainmeaningful frequencies, either the data set size must increasewith each new dimension, or for a given data set, preprocess itby collecting data into bins and apply the approach describedto bins. The bin approach is apt to introduce errors, whileincreasing the data set size is not always possible (in fact,
The 2005 IEEE International Conference on Fuzzy Systems
750
rarely is possible).In any case, regardless of the approach used, anotherproblem that arises is that of interpolation for computingthe membership degree to unlabeled data points. Havingmultidimensional fuzzy sets makes this step more complex.The approach currently taken in this study is to derivefuzzy sets along each dimension, in effect, deriving as manyclassiﬁers as the dimension of the attribute space and to
aggregate
these classiﬁers in order to evaluate a data point.Several aggregation operators are proposed here but otheraggregation methods such as the ones presented in [10] canbe used too. The following notations are used in deﬁning theaggregation methods ( , ):denotes the class label of a test point ;with is the indicatorfunction;for is a set of weight characterizingthe attributes ( is the number of correctly classiﬁed trainingdata by the attribute).Then, the aggregations are deﬁned as follows:1) : .2) : .3) : .4) : .Based on the , , the class label of is decidedby evaluatingfor .But ﬁrst, it is interesting to understand why one mayexpect a good performance from the fuzzy classiﬁer applied toimbalance data. As it can be observed from Figure 1, willbe assigned as belonging to the minority class since its degreeto this class is and the membership to the majority classis . Looking at the original data shows that ’s frequencyin the minority class is while in the majority class it is .Any classiﬁer in which is learned based on its contributionto a class relative to the whole data set, will assign to themajority class.Classiﬁers such as the fuzzy classiﬁer used in this study,which learn the classiﬁcation based on the relative frequencywithin the class will assign to the minority class, where itsrelative frequency of is greater than its relative frequencyof in the majority class. Otherwise stated,
within theclass-size context, the point is more representative for theminority class than for the majority class. This idea is captured by the fuzzy classiﬁer and makes it suitable for imbalanced data sets
.IV. P
ERFORMANCE
E
VALUATION
When learning classes, even for balanced data sets, forwhich the errors coming from different classes have differentcosts, the overall accuracy is not a good measure of theclassiﬁer performance. Even more, when the class distributionis highly imbalanced, the accuracy is biased to favor the
TABLE IIIT
HE CONFUSION MATRIX
.PredictedNegative PositiveActual NegativePositive
majority class and does not value rare cases as much ascommon cases. Therefore, it is more appropriate to use asperformance evaluation measure the ROC (Receiving OperatorCharacteristic) curves. The ROC curves provide a visualrepresentation of the trade-off between true positives (TP) andfalse positives (FP) as expressed in the Equations 2 and 3. The
confusion matrix
shown in Table III contains informationaboutactual and predicted classiﬁcation done by a classiﬁcationsystem.(2)(3)However, for the purpose of comparing the results of this study with results published in [5], accuracy is alsoused as a measure to evaluate a classiﬁer, in addition of theROC curves. The fuzzy sets obtained with the procedureindicated previously in this paper are discrete fuzzy sets.However, their evaluation is required on unseen points.The standard approach to this problem is to extend thediscrete fuzzy set to a continuous version by
piecewiselinear interpolation
. More precisely, if denotes a datapoint, and a fuzzy set with membership , with support, then themembership degree of to is given by
otherwise(4)
V. R
ESULTS AND
A
NALYSIS OF THE
S
TUDY
All the results reported in this study are averaged over 30runs and the test data reﬂect the natural distributions of thedomains.Figures 2 - 6 show the overall error percentage whendifferent training class distributions are used. andoutperform decision trees in four of the ﬁve domains studiedhere. For letter-vowel domain, and give less error onlyfor class distributions higher than (Figure 4). In Figures7 - 11 are plotted the ROC curves of the four fuzzy classiﬁers,obtained for various class distributions. For all the ﬁve datasets ’s ROC curve is dominant: it is above the other ROCcurves and it is closer to the y axis.
The 2005 IEEE International Conference on Fuzzy Systems
751
0 10 20 30 40 50 60 70 80 90 100051015202530Training distribution (percentage of minority class).
E r r o r ( p e r c e n t a g e ) .
C4.5(Weiss)D1D2D3D4
Fig. 2. Letter-a: the error in classiﬁcation over various degrees of classdistributions. Natural distribution is .
0 10 20 30 40 50 60 70 80 90 1000102030405060Training distribution (percentage of minority class).
E r r o r ( p e r c e n t a g e ) .
C4.5(Weiss)D1D2D3D4
Fig. 3. OptDigits: the error in classiﬁcation over various degrees of classdistributions. Natural distribution is .
0 10 20 30 40 50 60 70 80 90 1001015202530354045Training distribution (percentage of minority class).
E r r o r ( p e r c e n t a g e ) .
C4.5(Weiss)D1D2D3D4
Fig. 4. Letter-vowel: the error in classiﬁcation over various degrees of classdistributions. Natural distribution is .
0 10 20 30 40 50 60 70 80 90 10025303540455055606570Training distribution (percentage of minority class).
E r r o r ( p e r c e n t a g e ) .
C4.5(Weiss)D1D2D3D4
Fig. 5. German: the error in classiﬁcation over various degrees of classdistributions. Natural distribution is .
0 10 20 30 40 50 60 70 80 90 100246810121416182022Training distribution (percentage of minority class).
E r r o r ( p e r c e n t a g e ) .
C4.5(Weiss)D1D2D3D4
Fig. 6. Wisconsin: the error in classiﬁcation over various degrees of classdistributions. Natural distribution is .
For the german data set, the trade-off between FP and TPis obvious (Figure 10): training with more Min examples in-troduces more false positives. The combination of two factorscontributes to this behavior:1) attributes (out of ) have exactly the same range of values for the Min and Maj classes (complete overlap)and the remaining three attributes overlap partially;2) the natural class distribution (present in the test data) is.Therefore, when the classiﬁer is trained with many Minexamples, the recognition of the Min class (which makesof the test set) improves, but at the cost of misclassifying muchmore Maj points, since the Maj class is present in testing withof data. The analysis of Figure 5 (where the plain erroris reported) leads to the same conclusion.The letter-a domain presents naturally more imbalance( ) than the letter-vowel domain ( ), though surpris-ingly, letter-a is better recognized (see Figures 2 and 4). This ismainly due to the fact that, the Min class for letter-a (instancesof letter a) is better deﬁned, as a concept, than the Min class for
The 2005 IEEE International Conference on Fuzzy Systems
752
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91False positives.
T r u e p o s i t i v e s .
D1D2D3D4
Fig. 7. Letter-a: the ROC curves obtained for the various class distributions.Natural distribution is .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91False positives.
T r u e p o s i t i v e s .
D1D2D3D4
Fig. 8. OptDigits: the error in classiﬁcation over various degrees of classdistributions. Natural distribution is .
letter-vowel (instances of a, e, i, o, u). In the same idea, thereis more overlap between the classes in letter-vowel set thanin the letter-a data set: letter-vowel domain has two attributescompletely overlapped and in other attributes (out of )has more overlap than the letter-a data set. The ROC curvesare also consistent with the previous observation: they showindeed, a better (tighter) clustering of the letter-a Min class(Figure 7) than the letter-vowel (Figure 9).Figure 3 shows that fuzzy classiﬁer performs well inrecognizing both the Min and Maj class for the optDigitdomain. This domain has attributes (of which, attributestotally overlap) and a natural imbalance of . A highererror when the training class distribution is , is due to thefact that the Min class is not learned well and mainly Minclass contributes to the error (for , a ROC point on the yaxis at ). The increase in error for the class distributionof is due to the fact that Maj class is under-representedin training and this time Maj class has a higher error rate.Though, the number of false positives does not grow much(Figure 8: for , the ROC point is ).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91False positives.
T r u e p o s i t i v e s .
D1D2D3D4
Fig. 9. Letter-vowel: the ROC curves obtained for the various classdistributions. Natural distribution is .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91False positives.
T r u e p o s i t i v e s .
D1D2D3D4
Fig. 10. German: the ROC curves obtained for the various class distributions.Natural distribution is .
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100.10.20.30.40.50.60.70.80.91False positives.
T r u e p o s i t i v e s .
D1D2D3D4
Fig. 11. Wisconsin: the ROC curves obtained for the various classdistributions. Natural distribution is .
The 2005 IEEE International Conference on Fuzzy Systems
753

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...Sign Now!

We are very appreciated for your Prompt Action!

x