12 pages

Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Breast cancer and liver disorders classification using artificial immune recognition system (AIRS) with performance evaluation by fuzzy resource allocation mechanism
  Breast cancer and liver disorders classification using artificialimmune recognition system (AIRS) with performance evaluationby fuzzy resource allocation mechanism Kemal Polat  a,* , Seral S  ahan  a , Halife Kodaz  b , Salih Gu¨nes  a a Selcuk University, Engineering and Architecture Faculty, Electrical and Electronics Engineering, 42075 Konya, Turkey b Selcuk University, Computer Engineering, Konya, Turkey Abstract Artificial Immune Recognition System (AIRS) classification algorithm, which has an important place among classification algo-rithms in the field of Artificial Immune Systems, has showed an effective and intriguing performance on the problems it was applied.AIRS was previously applied to some medical classification problems including Breast Cancer, Cleveland Heart Disease, Diabetes andit obtained very satisfactory results. So, AIRS proved to be an efficient artificial intelligence technique in medical field. In this study,the resource allocation mechanism of AIRS was changed with a new one determined by Fuzzy-Logic. This system, named as Fuzzy-AIRS was used as a classifier in the diagnosis of Breast Cancer and Liver Disorders, which are of great importance in medicine. Theclassifications of Breast Cancer and BUPA Liver Disorders datasets taken from University of California at Irvine (UCI) MachineLearning Repository were done using 10-fold cross-validation method. Reached classification accuracies were evaluated by comparingthem with reported classifiers in UCI web site in addition to other systems that are applied to the related problems. Also, the obtainedclassification performances were compared with AIRS with regard to the classification accuracy, number of resources and classifica-tion time. Fuzzy-AIRS, which reached to classification accuracy of 98.51% for breast cancer, classified the Liver Disorders datasetwith 83.36% accuracy. For both datasets, Fuzzy-AIRS obtained the highest classification accuracy according to the UCI web site.Beside of this success, Fuzzy-AIRS gained an important advantage over the AIRS by means of classification time. In the experiments,it was seen that the classification time in Fuzzy-AIRS was reduced about 70% of AIRS for both datasets. By reducing classificationtime as well as obtaining high classification accuracies in the applied datasets, Fuzzy-AIRS classifier proved that it could be used as aneffective classifier for medical problems.   2005 Elsevier Ltd. All rights reserved. Keywords:  Fuzzy resource allocation; AIRS; Breast Cancer dataset; Liver Disorders dataset;  k  -Fold cross-validation 1. Introduction The use of classifier systems in medical diagnosis isincreasing gradually. There is no doubt that evaluation of data taken from patient and decisions of experts are themost important factors in diagnosis. But, expert systemsand different artificial intelligence techniques for classifica-tion also help experts in a great deal. Classification systems,helping possible errors that can be done because of fatiguedor inexperienced expert to be minimized, provide medicaldata to be examined in shorter time and more detailed.Breast Cancer is a very common cancer type amongwomen. Today, innovations in cancer treatment havecaused higher survival rates in cancer so in Breast Cancer.Especially, early diagnosis can increase these survival ratesat considerable amount. Liver Disorders is also an impor-tant disease in medicine. Levels of enzymes mixed to bloodare analysed in Liver Disorders diagnosis. There can be lots 0957-4174/$ - see front matter    2005 Elsevier Ltd. All rights reserved.doi:10.1016/j.eswa.2005.11.024 * Corresponding author. Tel.: +90 332 2232098; fax: +90 332 2410635. E-mail addresses: (K. Polat), S  ahan), (H. Kodaz), (S. Gu¨nes  ). Expert Systems with Applications 32 (2007) 172–183 Expert Systems with Applications  of possible errors in this diagnosis due to the number of enzymes to be many as well as the effects of different takenalcohol rates to be vary from one patient to the other(Yalc¸ın & Yıldırım, 2003). While a new artificial intelligence field named as Artifi-cial Immune Systems (AIS) was emerging in late 1990s,performances of proposed methods were not so good espe-cially for classification problems. However, ArtificialImmune Recognition System (AIRS) proposed in 2001has changed this situation by taking attention among otherclassifiers with its performance (Watkins, 2001). The rea-son of its success in classification problems can be foundin the following properties of it (Goodman, Boggess, &Watkins, 2003): •  AIRS performs good on very different problems such aslarge dimensioned feature space problems, problemswith many classes, . . .  etc. •  AIRS is self-adjusting with regard to its architecture inproblem space.In this study, resource allocation of AIRS was changedwith its equivalence formed with Fuzzy-Logic to increaseits classification performance by means of resource num-ber and classification time more than classification accu-racy. The effects of this change were analysed in theapplications using medical datasets and obtained classifi-cation accuracies were compared with other classifiersused for same datasets. Fuzzy-AIRS has showed a furtherperformance than expected and obtained the highestclassification accuracy among the classifiers reported inUCI web site on these medical datasets consisting of Breast Cancer and Liver Disorders taken from UCI data-base (,2005). Fuzzy-AIRS, which proved it self to be used as aneffective classifier in medical field by reaching its goal, hasalso provided a considerable decrease in the number of resources. In all applications conducted, Fuzzy-AIRSrequired less resource than half of required by AIRSand by this way, classification time has reduced in a greatrate.The rest of the paper is organized as follows. Section 2gives the background information including natural andartificial immune systems, AIRS, introduction of BreastCancer and Liver Disorders in brief and previous researchin literature. We explained the method in Section 3 withsubtitles of Fuzzy Resource Allocation, Data Source,Used Parameters and Measures for Performance Evalua-tion. In each subsection of this section, the detailedinformation was given. The results obtained in applica-tions are given in Section 4 both for Breast Cancer andLiver Disorders dataset. Section 5 includes the discus-sion of these results in specific and general manner.Consequently in Section 6, we conclude the paper withsummarization of results by emphasizing the impor-tance of this study and mentioning about some futurework. 2. Background As in other artificial intelligence techniques, AIS hasemerged to design problem solving algorithms with highperformance. AIRS is one of the classification algorithmsthat proposed in this area but it has taken a great attentionin a short time. It has reached to high classification accura-cies both in machine-learning benchmarks and real-worldproblems including medical data. This is the reason whythis classification system was applied to two importantmedical classification problems after some improvementsin this study.  2.1. Natural and artificial immune systems The natural immune system is a distributed novel-pat-tern detection system with several functional componentspositioned in strategic locations throughout the body.Immune system regulates defence mechanism of the bodyby means of innate and adaptive immune responses.Between these, adaptive immune response is much moreimportant for us because it contains metaphors like recog-nition, memory acquisition, diversity, self-regulation . . . etc.The main architects of adaptive immune response are lym-phocytes, which divide into two classes as T and B Lym-phocytes (cells), each having its own function. EspeciallyB cells have a great importance because of their secretedantibodies (Abs) that take very critical roles in adaptiveimmune response.The simplified working procedure of our immune systemis illustrated in Fig. 1. Specialized Antigen Presenting Cells(APCs) called Macrophages circulates through the bodyand if they encounter an Antigen, they ingest and fragmentthem into antigenic peptides (I). The pieces of these pep-tides are displayed on the cell surface by Major Histocom-patibility Complex (MHC) molecules existing in thedigesting APC. The presented MHC-peptide combinationon the cell surface is recognized by the T cells causing themto be activated (II). Activated T cells secrete some chemi-cals as alert signals to other units in response to this recog-nition. B cells, one of the units that take these signals fromthe T cells become activated with the recognition of Anti-gen by their Antibodies occurred in the same time (IV).When activated, B cells turn into plasma cells that secretebound Antibodies on their surfaces (V). Secreted Antibod-ies bind the existing Antigens and neutralize them signal-ling other components of immune system to destruct theAntigen–Antibody complex (VI) (De Castro & Timmis,2002). For detailed information about immune system referto Abbas and Lichtman (2003).Artificial Immune Systems emerged in the 1990s as anew computational research area. Artificial Immune Sys-tems link several emerging computational fields inspiredby biological behaviour such as Artificial Neural Networksand Artificial Life.In the studies conducted in the field of AIS, B cellmodelling is the most encountered representation type. K. Polat et al. / Expert Systems with Applications 32 (2007) 172–183  173  Different representation methods have been proposed inthat modelling. Among these, shape-space representationis the most commonly used one (Perelson & Oster, 1979).The shape-space model ( S  ) aims at quantitativelydescribing the interactions among antigens (Ags), the for-eign elements that enter the body like microbe . . . etc., andantibodies (Ag–Ab). The set of features that characterizea molecule is called its  generalized shape . The Ag–Ab rep-resentation (binary or real-valued) determines a distancemeasure to be used to calculate the degree of interactionbetween these molecules. Mathematically, the generalizedshape of a molecule ( m ), either an antibody or an antigen,can be represented by a set of coordinates m  =  h m 1 , m 2 , . . . , m L i , which can be regarded as a point inan  L -dimensional real-valued shape-space ( m  2  S  L ). In thiswork, we used real strings to represent the molecules. Anti-gens and antibodies were considered of same length  L . Thelength and cell representation depend upon the problem(De Castro & Timmis, 2002).  2.2. AIRS classification algorithm AIRS is a resource limited supervised learning algorithminspired from immune metaphors. In this algorithm, theused immune mechanisms are resource competition, clonalselection, affinity maturation and memory cell formation.The feature vectors presented for training and test arenamed as Antigens while the system units are called as Bcells. Similar B cells are represented with Artificial Recog-nition Balls (ARBs) and these ARBs compete with eachother for a fixed resource number. This provides ARBs,which have higher affinities to the training Antigen toimprove. The memory cells formed after the whole trainingAntigens were presented are used to classify test Antigens.The algorithm is composed of four main stages, which areinitialization, memory cell identification and ARB genera-tion, competition for resources and development of a can-didate memory cell and lastly memory cell introduction.The flow chart of the algorithm is shown in Fig. 2.  2.2.1. Initialization The first step of the algorithm is a data pre-processingstage. Firstly, all of the data are normalized to ensure thatthe Euclidean distance between two data is in the intervalof [0–1]. Euclidean distance is used for both affinity andstimulation value calculations (Eq. (1)):Euclidean distance  ¼  ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX ni ¼ 1 ð  x i    y  i Þ 2 s   ð 1 Þ Fig. 1. General immune response to invaders (De Castro & Timmis, 2002).174  K. Polat et al. / Expert Systems with Applications 32 (2007) 172–183  where  x  and  y  refer feature vectors while  n  is the number of attributes in data.  2.2.2. Memory cell identification and ARB generation In this step, the algorithm begins to iterate for eachtraining antigen. Training antigen is presented to memorycells and the most stimulated memory cell by that anti-gen is cloned. The stimulation levels are calculated byEq. (2). All of the clones with memory cell are added toARB pool. Here, the number of clones is determinedaccording to the affinity between memory cell and anti-gen. Calculation of affinity values is done as in Eq. (3),which results higher affinities for lower Euclideandistances:Stimulation ð  x ;  y  Þ¼ affinity ð  x ;  y  Þ ;  if class of   x  ¼  class of   y  1    affinity ð  x ;  y  Þ ;  otherwise (  ð 2 Þ affinity ð  x ;  y  Þ ¼  1    Euclidean distance ð  x ;  y  Þ¼  1    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX ni ¼ 1 ð  x i    y  i Þ 2 s   ð 3 Þ  2.2.3. Competition for resources and development of acandidate memory cell  After above processes, the training antigen is presentedto all ARBs in ARB pool. All of the ARBs are awardedso that the rewards of ARBs in the same class with pre-sented antigen are given according to higher affinity val-ues than the ARBs in different classes. Here the rewardsare resource numbers. The required number of resourcescan exceed the allowed number by the system. In thiscase, excess resources are removed beginning with loweraffinity ARBs and this continues until the required num-ber is equal to the allowed number of resources. The stim-ulation levels of remaining ARBs are tested and theaverage value of these levels is determined for each class.If any of these average values is lower than a stimulationthreshold determined by user, the ARBs belonging to thatclass are mutated and resulted clones are added to ARBpool. This step proceeds until average stimulation of allclasses is bigger than stimulation threshold. Eq. (4) showsthe formula for calculation of average stimulation valuefor each class:  s i   P j ARB i j  j ¼ 1  arb  j .stim j ARB i j  ;  arb  j  2  ARB i  ð 4 Þ where  i   = 1, . . . , nc ,  s  = { s 1 , s 2 , . . . , s nc },  j ARB i  j  is the num-ber of ARBs belonging to  i  th class and arb  j  .stim is the stim-ulation level of   j  th ARB of   i  th class.  2.2.4. Memory cell introduction After the total stimulation value of ARBs in all classesreaches stimulation threshold, the best ARB in the sameclass with training antigen is taken as a candidate memorycell. Here, best is synonym of having the highest affinity. If the stimulation value between training antigen and thiscandidate memory cell is bigger than the stimulation valuebetween training antigen and srcinal memory cell selectedfor cloning in step 2, the candidate memory cell is added tothe memory cell pool.These steps are repeated for each training antigen. Aftertraining, test data are presented only to memory cells. k  -Nearest neighbour algorithm is used in determinationof classes in test phase. For detailed information aboutAIRS, the reader is referred to Watkins (2001).  2.3. Used medical data: Breast Cancer and LiverDisorders It was reported in Delen, Walker, and Kadam (2005)and (2005) that Breast cancer ismost frequent disease in women in United States and there-fore it is very important concern in medical field of UnitedStates. Cancer begins with uncontrolled division of one celland results in a visible mass named Tumour. Tumour canbe benign or malignant. Malignant Tumour grows rapidlyand invades its surrounding tissues causing their damage.Breast cancer, is a malignant tissue beginning to grow in Fig. 2. Flow chart of AIRS algorithm. K. Polat et al. / Expert Systems with Applications 32 (2007) 172–183  175  the breast. The abnormalities like existence of a breastmass, change in shape and dimension of breast, differencesin the colour of breast skin, breast aches, . . . etc. are thesymptoms of breast cancer. Cancer diagnosis is performedbased on the non-molecular criterions like tissue type,pathological properties and clinical location (Kıyan & Yıldırım, 2003). As for the other cancer types early diagno- sis in Breast cancer can be life saving.Liver is an effective organ in neutralizing toxics andthrowing them from the body. If the amount of toxicsreaches a level exceeding working capacity of the organ,the cells of related parts in organ are destroyed. Then, somesubstances and enzymes are appeared and interfere inblood. During diagnosis of the disease, the levels of theseenzymes are analysed. Because of the fact that effects of dif-ferent alcohol dosages vary from one person to the other aswell as the fact that there are many enzymes, there can befrequent possible errors in diagnosis (Yalc¸ın & Yıldırım, 2003;, 2005).Used datasets in this study belong to Breast Cancer andLiver Disorders dataset taken from UCI Machine LearningRepository (, 2005).  2.4. Previous research 2.4.1. Research on AIRS  As mentioned in Section 2.2, AIRS is a classifier basedon principles of resource-limited artificial immune systems,which was proposed by Watkins (2001). In this thesis workAIRS was applied a serial of problems including bothmachine-learning benchmarks and real-world problems.In some of these AIRS’s performance was very good andthis study has brought related works after then.In the study of Donald E. Goodman et al., theyapplied AIRS to multi-class problems and compared itwith Kohonen’s LVQ (Goodman, Boggess, & Watkins,2002). In most of the applied problems, they found thatAIRS performed better than LVO and Optimized LVQ.Gaurav Marwah and Lois Boggess tried to do some mod-ifications on AIRS for resource allocation and approachesto ARB pool organization (Marwah & Boggess, 2002).They also explored several different algorithms for tiebreaking which could increase the accuracy of AIRSand other  k  -nearest neighbour classifiers. In the Studyof Donald E. Goodman et al., AIRS was examinedempirically, replacing one of the two likely sources of its classification power with alternative modifications(Goodman et al., 2003). They concluded with the modifi-cations provided fast test versions of AIRS for users toexperiment with. Besides of these studies, one more studyhas conducted recently. Hamaker and Boggess analysedthe effects of adding non-Euclidean distance measures tothe basic AIRS algorithm (Hamaker & Boggess, 2004).They used iris, Wisconsin Breast Cancer, Cleveland Heartdisease and credit screening (Crx) datasets in theirexperiments.  2.4.2. Research on Breast Cancer and Liver Disordersclassification As for the other clinical diagnosis problems, classifica-tion systems have been used for breast cancer diagnosisproblem, too. When the studies in the literature relatedwith this classification application are examined, it can beseen that a great variety of methods were used whichreached high classification accuracies. Among these, Quin-lan reached 94.74% classification accuracy using 10-foldcross-validation (10  ·  CV) with C4.5 method (Quinlan,1996). Hamilton et al., obtained 96% accuracy with RIACmethod (10  ·  CV) (Hamilton, Shan, & Cercone, 1996)while Ster and Dobnikar obtained 96.8% with Linear Dis-criminant Analysis (LDA) method (10  ·  CV) (Ster &Dobnikar, 1996). The accuracy obtained by Bennett andBlue who used SVM method was 97.2% (5  ·  CV) (Bennet& Blue, 1997) while by Nauck and Kruse was 95.06% withneuro-fuzzy techniques (10  ·  CV) (Nauck & Kruse, 1999)and by Pena-Reyes and Sipper was 97.51% using Fuzzy-GA method (train: 75%–test: 25%) (Pena-Reyes & Sipper,1999). Moreover, Setiono was reached 98.1% by usingneuro-rule method (train: 50%–test: 50%) (Setiono, 2000).Goodman et al. applied three different methods to theproblem which were resulted with the following accuracies:Optimized-LVQ method’s performance was 96.7%, big-LVQ method reached 96.8% and the last method, AIRSwhich he proposed depending on the Artificial ImmuneSystem, obtained 97.2% classification accuracy (10  ·  CV)(Goodman et al., 2002). Nevertheless, Abonyi and Szeifertapplied Supervised Fuzzy Clustering (SFC) technique andobtained 95.57% accuracy (10  ·  CV) (Abonyi & Szeifert,2003).Like Breast Cancer, there are many studies for classifica-tion of Liver Disorders, too. Newton Cheung used somemethods for this problem (Cheung, 2001). He obtained65.50% classification accuracy using C4.5 (10  ·  CV),63.39% using Naive Bayes classifier (10  ·  CV), 61.83%using Bayesian Network with Naive Dependence (BNND)classifier (10  ·  CV) and 61.42% using Bayesian Networkwith Naive Dependence and Feature Selection (BNNF)classifier (10  ·  CV). Tony Van Gestel et al. reached69.20% classification accuracy with Support VectorMachine (SVM) classifier (10  ·  CV) (Van Gestel et al.,2002). The two methods that were used by Yuh-Jye andMangarissan were Smooth Support Vector Machine(SSVM) classifier (10  ·  CV) (Lee & Mangasarian, 2001a)and Reduced Support Vector Machines (RSVM) classifier(10  ·  CV) (Lee & Mangasarian, 2001b). They obtained70.33% and 74.86% classification accuracies with thesemethods respectively. The classification accuracy obtainedby Pham et al., using RULES-4 algorithm was 55.90%(train: 40%–test: 60%) (Pham, Dimov, & Salem, 2000).Beside of these studies, Yalc¸ın and Yıldırım used someNeural Network architectures for this problem (Yalc¸ın & Yıldırım, 2003). The classification accuracy obtained with Multilayer Perceptron (MLP) was 73.05%, 42.03%with Probabilistic Neural Networks (PNN), 65.55% with 176  K. Polat et al. / Expert Systems with Applications 32 (2007) 172–183
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks