Self-Help

6 pages
5 views

A new medical decision making system: Least square support vector machine (LSSVM) with Fuzzy Weighting Pre-processing

of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
A new medical decision making system: Least square support vector machine (LSSVM) with Fuzzy Weighting Pre-processing
Transcript
  A new medical decision making system: Least square support vectormachine (LSSVM) with Fuzzy Weighting Pre-processing Emre C¸omak  a,* , Kemal Polat  b , Salih Gu¨nes  b , Ahmet Arslan  a a Department of Computer Engineering, Engineering and Architecture Faculty, Selcuk University, Konya 42075, Turkey b Department of Electrical and Electronics Engineering, Engineering and Architecture Faculty, Selcuk University, Konya 42075, Turkey Abstract The use of machine learning tools in medical diagnosis is increasing gradually. This is mainly because the effectiveness of classificationand recognition systems has improved in a great deal to help medical experts in diagnosing diseases. This study aims at diagnosing LiverDisorder with a new hybrid machine learning method. By hybridizing LSSVM with Fuzzy Weighting Pre-processing, a method wasobtained to solve this diagnosis problem via classifying Liver Disorder. Fuzzy Weighting Pre-processing stage was developed firstlyin our study. This Liver Disorder dataset is a very commonly used dataset in literature relating the use of classification systems for LiverDisorder Diagnosis and it was used in this study to compare the classification performance of our proposed method with regard otherstudies. We obtained a classification accuracy of 94.29%, which is the highest one reached so far. This result is for Liver Disorder but itstates that this method can be used confidently for other medical diseases diagnosis problems, too.   2005 Elsevier Ltd. All rights reserved. Keywords:  SVM; LSSVM; Fuzzy Weighting Pre-processing; Liver Disorder disease diagnosis; ROC curves 1. Introduction With improvements in medical knowledge systems inmedical institutes and hospitals, determining useful knowl-edge is becoming more difficult. Especially, because the con-ventional manual data analysis techniques are not effectivein diagnosis, using computer based analyses are becominginevitable in disease diagnosis. So, it is the time to developmodern, effective and efficient computer based systems fordecision support. There are a number of data analysis tech-niques: statistical, machine learning and data abstraction.Medical analysis using machine learning techniques hasbegun to be conducted for last twenty years. The advanta-ges of using machine learning schemes in medical analysishave caused human support and costs to decrease andcaused diagnosis accuracy to increase (Cheung, 2001).Many researches have been increasingly studied for newtechnologies that are used to help doctors for diagnosis of disorders. Different techniques without doctors have beentried to develop for diagnosis of disorders. In some disor-ders Artificial Neural Networks (ANNs) have been success-ful (Yalc¸ın & Yıldırım, 2003). As known to many people, liver’s mission is to store andthrow out the material includes poison. If amount of thematerial includes poison is greater than capacity of liverstore, related areas in the liver will deteriorate. Some mate-rials and enzymes join in blood. While diagnosing the dis-order, level of these enzymes are investigated. In thediagnosing of disorders the liver, very much error couldbe made since both there are many enzymes and differentamounts of alcohol can cause different disorders in differ-ent patients (Yalc¸ın & Yıldırım, 2003). 0957-4174/$ - see front matter    2005 Elsevier Ltd. All rights reserved.doi:10.1016/j.eswa.2005.12.001 * Corresponding author. Tel.: +90 332 2232043, +90 332 2232098, +90 332 2232082, +90 332 2232000; fax: +90 332 2410635. E-mail addresses:  ecomak@selcuk.edu.tr (E. C¸omak), kpolat@selcuk.edu.tr (K. Polat), sgunes@selcuk.edu.tr (S. Gu ¨nes  ), ahmetarslan@selcuk.edu.tr(A. Arslan). www.elsevier.com/locate/eswa Expert Systems with Applications 32 (2007) 409–414 Expert Systems with Applications  In this paper, a new medical diagnosing system based onFuzzy Weighting Pre-processing developed by us andLSSVM is proposed. Proposed system is implemented onBUPA Liver Disorder dataset taken from UCI MachineLearning Repository (BUPA Liver Disorders Dataset)and 94.29% classification performance is obtained. Thisdiagnosing rate is the highest rate in literature. Accordingto this result, our system can be used for medicaldiagnosing.The rest of the paper is organized as follows. Weexplained the method in Section 2 with subtitles of SVM,LSSVM and Fuzzy Weighting Pre-processing. In each sub-section of this section, the detailed information is given.Section 3 gives used Liver Disorder data source and data-set. The results obtained in applications are given in Sec-tion 4 for Liver Disorders dataset. Consequently, inSection 5, we conclude the paper with summarization of results by emphasizing the importance of this study andmentioning about some future work. 2. Material and method In this work, we have used LSSVM and Fuzzy Weight-ing Pre-processing as material and method. These areexplained as follows.  2.1. LSSVM  In this section we firstly mention about SVM classifierafter that LSSVM related to SVM.  2.1.1. Support vector machines (SVMs) SVM is a reliable classification technique, which is basedon the statistical learning theory. This technique was firstlyproposed for classification and regression tasks by Vapnik(1995).As shown in Fig. 1, a linear SVM was developed to clas-sify the data set which contains two separable classes suchas {+1,  1}. Let the training data consist of n data( x 1 ,  y 1 ), . . . ,( x n ,  y n ),  x  2  R n and  y  2  {+1,  1}. To separatethese classes, SVMs have to find the optimal (with maxi-mum margin) separating hyperplane so that SVM has goodgeneralization ability. All of the separating hyperplanes areformed with  D ð  x Þ ¼ ð w   x Þ þ w 0  ð 1 Þ and provide following inequality for both  y  = +1 and  y  =   1:  y  i ½ð w   x i Þ þ w 0  P 1 ;  i  ¼  1 ;  . . .  ; n  ð 2 Þ The data points which provide above formula in case of equality are called the support vectors. The classificationtask in SVMs is implemented by using of these supportvectors.Margins of hyperplanes obey following inequality:  yk     D ð  xk  Þk w k  P C ;  k   ¼  1 ;  . . .  ; n  ð 3 Þ To maximize this margin ( C ), norm of   w  is minimized.To reduce the number of solutions for norm of   w , followingequation is determined: C   k w k ¼  1  ð 4 Þ Then formula (5) is minimized subject to constraint (2). 12 k w k 2 ð 5 Þ When we study on the non-separable data, slack vari-ables  n i  , are added into formula (2) and (5). Instead of for-mulas (2) and (5), new formulas are used:  y  i ½ð w   x i Þ þ w 0  P 1    n i  ð 6 Þ C  X ni ¼ 1 n i  þ  12 k w k 2 ð 7 Þ Since srcinally SVMs classify the data in linear case, inthe nonlinear case SVMs do not achieve the classificationtasks. To overcome this limitation on SVMs, kernelapproaches are developed. Nonlinear input data set is con-verted into high dimensional linear feature space via ker-nels. In SVMs, following kernels are most commonly used: •  Dot product kernels:  K  ( x , x 0 ) =  x Æ x 0 . •  Polynomial kernels:  K  ( x , x 0 ) = ( x Æ x 0 + 1) d , where d isthe degree of kernel and positive integer number. •  RBF kernels:  K  ( x , x 0 ) = exp( k x    x 0 k 2 / r 2 ), where  r  isa positive real number.In our experiments  r  = 1.7 is selected.  2.1.2. Least squares support vector machines (LSSVM) LSSVMs are proposed by Suykens and Vandewalle(1999). The most important difference between SVMs andLSSVMs is that LSSVMs use a set of linear equations fortraining while SVMs use a quadratic optimization problem Fig. 1. The structure of a simple SVM.410  E. C¸ omak et al. / Expert Systems with Applications 32 (2007) 409–414  (Tsujinishi & Abe, 2003). While formula (7) is minimized subject to formula (6) in Vapnik’s standard SVMs, inLSSVMs formula (9) is minimized subject to formula (8):  y  i ½ð w   x i Þ þ w 0  ¼  1    n i ;  i  ¼  1 ;  . . .  ; n  ð 8 Þ 12 k w k 2 þ C  2 X ni ¼ 1 n 2 i  ð 9 Þ According to these formulas, their dual problems arebuilt as follows: Q ð w ; b ; a ; n Þ ¼  12 k w k 2 þ C  2 X ni ¼ 1 n 2 i  X ni ¼ 1 a i f  y  i ½ð w   x i Þ þ w 0    1  þ n i g ð 10 Þ Another difference between SVMs and LSSVMs is that a i   (Lagrange multipliers) are positive or negative inLSSVMs but they must be positive in SVMs. Informationin detailed is found in Suykens and Vandewalle (1999)and Tsujinishi and Abe (2003).  2.2. Fuzzy Weighting Pre-processing  In the Fuzzy Weighting Pre-processing, each featuretakes new feature value according to its old value. Twomembership functions are defined in this pre-processingknown as input and output membership functions. Theseare selected as triangular membership functions as shownin Figs. 2 and 3, respectively.Firstly, the formation of these membership functions isrealized as follows: As a first step, the mean values of eachfeature are calculated through using all of the samples’ cor-responding feature values in m i  ¼  1  N  X  N k  ¼ 1  x k  ; i  ð 11 Þ Here,  x k  , i   represents the  i  th feature value of sample  x k  , k   = 1,2, . . . , N  . After calculation of these sample meansfor each feature, the input membership function is formedby triangles as in Fig. 2. The supports of these triangles aredetermined by Avg/8, Avg/4, Avg/2, Avg, 2  ·  Avg,4  ·  Avg, 8  ·  Avg as shown in Fig. 2. The lines of inputmembership functions are named as mf1, mf2, . . . ,mf8.For the output membership function formation, againeight parts formed membership functions (Fig. 3). Theinterval [0,1] is divided into eight equal part and the corre-sponding lines are again named but in this case as mf1 0 ,mf2 0 , . . . ,mf8 0 . Before continuing it is worth to note thatthese input and output membership functions are formedfor each feature so there will be exist different input–outputmembership function configuration for each feature sincethe sample means of each feature differs. Fig. 2. Input membership function.Fig. 3. Output membership function. E. C¸ omak et al. / Expert Systems with Applications 32 (2007) 409–414  411  After determination of input and output membershipfunctions for each feature, the weighting pre-processingcomes into scene. For a feature value, say  x k  , i  , that is for i  th feature value of   x k   sample, this value is taken as inthe  x -axis of input membership function and  y -values of the points at which this value cuts the input membershipfunctions are determined. For example if this feature valueis between 0 and Avg/8, then this point will cut both linemf1 and mf2. The  y -values at these intersection points,say  y 1  and  2 , are known as membership values ( l ) and theywill then be used in a fuzzy rule base in the following man-ner: firstly, the input membership value,  l ( i  ), is determinedby using the above intersection points: l ð i Þ ¼  l  A \  B ð  x k  ; i Þ ¼  MIN ð l  A ð  x k  ; i Þ ; l  B ð  x k  ; i ÞÞ ;  x  2  X   ð 12 Þ Here,  l A ( x k  , i  ) and  l B  ( x k  , i  ) membership values correspondto the intersection points as mentioned above. The rulebase for our system is used as presented in Table 1. Afterthis  l ( i  ) value is determined through using Eq. (12) forour  x k  , i   feature value, the output weight value is thendetermined by using output membership functions andthe rules in Table 1. Here, in determining weight as a laststep, firstly the input membership value,  l ( i  ), is presentedto output membership function to determine the corre-sponding weighting value of our original feature value.This membership value is now taken as a point in  y -axisof the output membership functions and again as forthe case in input membership functions, the intersectionpoints are determined which are cut by this membershipvalue. It is apparent from output membership functionsthat there will be more than one intersection points. Thatwhich of them will be used is decided through the rulesin Table 1. For example, if input feature value cuts mf1and mf2 lines in input membership functions then the out-put value for this feature will be the mean of two pointsthat  l ( i  ) cuts mf1 0 and mf2 0 at the output membershipfunctions. 3. The used Liver Disorder dataset Liver is an effective organ in neutralizing toxics andthrowing them from the body. If the amount of toxicsreaches a level exceeding working capacity of the organ,the cells of related parts in organ are destroyed. Then, somesubstances and enzymes are appeared and interfere inblood. During diagnosis of the disease, the levels of theseenzymes are analysed. Because of the fact that effects of dif-ferent alcohol dosages vary from one person to the other aswell as the fact that there are many enzymes, there can befrequent possible errors in diagnosis (Yalc¸ın & Yıldırım, 2003).BUPA Liver Disorders data set which is prepared byBUPA medical research company includes 345 sampleswith 6 features and 2 class labels. All of the samples aretaken from only one single male.Two hundred samples of the whole data set belong tofirst class label and remaining 145 samples belong to sec-ond class label. The first five features for each sample areobtained from blood tests. The last feature is daily alcoholconsumption. Information about this data set in detailedcan be found in Yalc¸ın and Yıldırım (2003): •  mean corpuscular volume (Mcv); •  alkaline phosphotase (Alkphos)—protein in cell mem-brane of gall secretion; •  alanine aminotransferase (Sgpt)—it is one of the amino-transferase variety which cause raising of blood levelwhen hepatocellular necrosis is set; •  aspartate aminotransferase (Sgot)—it is one of the ami-notransferase variety which cause raising of blood levelwhen hepatocellular necrosis is set; •  c -glutamyl transpeptidase (Gammagt)—this is a testwhich determines the amount of GGT enzyme in blood; •  drinks—daily alcohol consumption (half-pint). 4. The experimental results In our experimental study, Liver Disorder dataset isfirstly pre-processed by Fuzzy Weighting Pre-processingand then classified by LSSVM classifier. As mentioned inSection 3, this data set includes 345 samples with six fea-tures and two output labels. While first class includes 200samples, 145 samples belong to second class. Training dataset includes 310 samples (180 samples from first class and130 samples from second class). Testing data set includes35 samples (20 samples from first class and 15 samplesfrom second class).In standard LSSVM 60.0% classification accuracy isobtained, while in LSSVM with Fuzzy Weighting Pre-pro-cessing 94.29% classification accuracy is obtained. Sensitiv-ity and Specificity accuracies are also presented in Table 2.In addition to, obtained accuracy rates in literature so farare listed in Table 3. Table 1Fuzzy rule base for our system1. if Input_value cuts mf1 and mf2 thenOutput_value = (mf1 0 (  y ) + mf2 0 (  y ))/22. if Input_value cuts mf2 and mf3 thenOutput_value = (mf2 0 (  y ) + mf3 0 (  y ))/23. if Input_value cuts mf3 and mf4 thenOutput_value = (mf3 0 (  y ) + mf4 0 (  y ))/24. if Input_value cuts mf4 and mf5 thenOutput_value = (mf4 0 (  y ) + mf5 0 (  y ))/25. if Input_value cuts mf5 and mf6 thenOutput_value = (mf5 0 (  y ) + mf6 0 (  y ))/26. if Input_value cuts mf6 and mf7 thenOutput_value = (mf6 0 (  y ) + mf7 0 (  y ))/27. if Input_value cuts mf7 and mf8 thenOutput_value = (mf7 0 (  y ) + mf8 0 (  y ))/2412  E. C¸ omak et al. / Expert Systems with Applications 32 (2007) 409–414  To compare the classification performances of standardLSSVM and LSSVM with Fuzzy Weighting Pre-processingclassifiers, receiver operator characteristic (ROC) curvesmethod is preferred. According to this method, ROCcurves and area under these curves are computed for bothclassifiers as shown in Fig. 4.ROC curves is a statistical comparing method whichuses the rates of true positive and false positive. Areasunder ROC curves are represented by Az value. This valueis related to the accuracies of classifiers. Higher valuesrepresent higher classification accuracies, while lowervalues represent lower classification accuracies (Osareh,Mirmehdi, Thomas, & Markham, 2002; Centor, 1991).ROC curves show that there is a significant differencebetween computed areas for two classifiers (Az = 0.95 forLSSVM with fuzzy but Az = 0.336 for LSSVM withoutfuzzy). 5. Discussion and conclusions In this paper, a new weighting method called FuzzyWeighting Pre-processing have been developed using fuzzylogic and a new medical diagnosis system is built by asso-ciating this weighting method with LSSVM classifier.In application phase of this study, developed LSSVMwith Fuzzy Weighting Pre-processing method is appliedto BUPA Liver Disorders dataset and 94.29% classificationrate is obtained. This rate is the highest classification rate inliterature. In addition to, with standard LSSVM (withoutFuzzy Weighting Pre-processing) 60% classification rate isobtained. This result shows that Fuzzy Weighting Pre-pro-cessing extremely increases the classification rate of LSSVM for current data set.According to the application results, LSSVM withFuzzy Weighting Pre-processing showed a considerablyhigh performance with regard to the classification accuracyespecially for BUPA Liver Disorders dataset.In this study also ROC curves are used to test the accu-racy of proposed system statistically. As shown in ROCcurves while under the area of ROC curves for standardLSSVM is 0.336, this area for LSSVM with Fuzzy Weight-ing Pre-processing is 0.95. According to these results, pro-posed system is very effective and reliable.Although developed method is built as an offline diag-nosing system, it can be rebuilt as an online diagnosing sys-tem in the future. Acknowledgement This study is supported by the Scientific Research Pro- jects of Selcuk University. References BUPA Liver Disorders Dataset. UCI Repository of Machine LearningDatabases. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/liver-disorders/bupa.data.Centor, R. M. (1991). Signal detectability: The use of ROC curves andtheir analysis.  Medical Decision Making, 11 , 102–106.Table 2Comparing between standard LSSVM and LSSVM with Fuzzy WeightingPre-processingSensitivity (%) Specificity (%)Standard LSSVM(60.0% testing accuracy) r  = 1.7, C   = 0.1100 6.66LSSVM with FuzzyWeighting Pre-processing(94.29% testing accuracy) r  = 1.7, C   = 0.195 93.33Fig. 4. ROC curves for LSSVM with fuzzy and LSSVM without fuzzywith (Az).Table 3LSSVM with Fuzzy Weighting Pre-processing classification accuracy forBUPA Liver Disorders problem with classification accuracies obtained byother methods in literatureAuthor (Year) Method Classificationaccuracy (%)Pham et al. (2000) RULES-4 (40%–60%) 55.90Cheung (2001) C4.5 (5  ·  CV) 65.59Cheung (2001) Naı¨ve Bayes (5  ·  CV) 63.39Cheung (2001) BNND (5  ·  CV) 61.83Cheung (2001) BNNF (5  ·  CV) 61.42Van Gestel et al. (2002) SVM with GP (10  ·  CV) 69.70Lee and Mangasarian(2001a, 2001b)SSVM (10  ·  CV) 70.33Lee and Mangasarian(2001a, 2001b)RSVM (10  ·  CV) 74.86Yalc¸ın and Yıldırım (2003) MLP (3  ·  CV) 73.05Yalc¸ın and Yıldırım (2003) PNN (3  ·  CV) 42.03Yalc¸ın and Yıldırım (2003) GRNN (3  ·  CV) 65.55Yalc¸ın and Yıldırım (2003) RBF (3  ·  CV) 58.55Polat et al. (2005) AIRS (10  ·  CV) 81.00Our method (2005) LSSVM with FuzzyWeighting Pre-processing94.29 E. C¸ omak et al. / Expert Systems with Applications 32 (2007) 409–414  413
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks