Crafts

8 pages
6 views

Automated identification of diseases related to lymph system from lymphography data using artificial immune recognition system with fuzzy resource allocation mechanism (fuzzy-AIRS

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Share
Description
Automated identification of diseases related to lymph system from lymphography data using artificial immune recognition system with fuzzy resource allocation mechanism (fuzzy-AIRS
Transcript
  Automated identification of diseases related to lymph system fromlymphography data using artificial immune recognition systemwith fuzzy resource allocation mechanism (fuzzy-AIRS) Kemal Polat*, Salih Gu¨nes¸ Selcuk University, Electrical & Electronics Engineering Department, 42035 Konya, Turkey Received 27 July 2006; received in revised form 16 November 2006; accepted 17 November 2006Available online 19 December 2006 Abstract Artificialimmunerecognitionsystem(AIRS)classificationalgorithm,whichhasanimportantplaceamongclassificationalgorithmsinthefieldof artificial immune systems, has showed an effective and intriguing performance on the problems it was applied. AIRS was previously applied tosome medical classification problems including breast cancer, Cleveland heart disease, diabetes and it obtained very satisfactory results. So, AIRSproved to be an efficient artificial intelligence technique in medical field. In this study, the resource allocation mechanism of AIRS was changedwith a new one determined by fuzzy-logic. This system, named as fuzzy-AIRS was used as a classifier in the diagnosis of lymph diseases, which isof great importance in medicine. The classifications of lymph diseases dataset taken from University of California at Irvine (UCI) MachineLearning Repository were done using 10-fold cross-validation method. Reached classification accuracies were evaluated by comparing them withreported classifiers in UCI web site in addition to other systems that are applied to the related problems. Also, the obtained classificationperformances were compared with AIRS with regard to the classification accuracy, number of resources and classification time. While only AIRSalgorithm obtained 83.138% classification accuracy, fuzzy-AIRS classified the lymph diseases dataset with 90.00% accuracy. For lymph diseasesdataset, fuzzy-AIRS obtained the highest classification accuracy according to the UCI web site. Beside of this success, fuzzy-AIRS gained animportant advantage over the AIRS by means of classification time. By reducing classification time as well as obtaining high classificationaccuracies in the applied datasets, fuzzy-AIRS classifier proved that it could be used as an effective classifier for medical problems. # 2006 Elsevier Ltd. All rights reserved. Keywords:  Fuzzy resource allocation; AIRS; Lymph diseases;  k  -Fold cross-validation; Expert system 1. Introduction Like the venous system, the lymphatic system transportsfluids throughout the body. The lymphatic system consists of thin-walled lymphatic vessels,lymph nodes, and two collectingducts. Lymphatic vessels, located throughout the body, arelarger than capillaries, and most are smaller than the smallestveins. Most of the lymphatic vessels have valves like those inveins to keep the lymph, which can clot, flowing in onedirection. Lymphatic vessels drain fluids that have diffusedthrough the very thin walls of capillaries. The fluids containproteins, minerals, nutrients, and other substances, whichprovide nourishment to tissues. However, most of the fluid isreabsorbed into the capillaries. The rest of the fluid (lymph) isdrained from the spaces surrounding the cells into thelymphatic vessels, which eventually return it to the veins.Lymphatic vessels also collect and transport damaged cells,cancer cells, and foreign particles (such as bacteria and viruses)that may have entered the tissue fluids [1].All of lymph passes through strategically placed lymphnodes, which filter damaged cells, cancer cells, and foreignparticles out of the lymph. Lymph nodes also producespecialized blood cells designed to engulf and destroy damagedcells, cancer cells, and foreign particles. Thus, importantfunctions of the lymphatic system are to remove damaged cellsfrom the body and to provide protection against the spread of infection and cancer [1].Many factors to analyze to diagnose the lymph diseases of apatient make the physician’s job difficult. A physician usually www.elsevier.com/locate/bspcBiomedical Signal Processing and Control 1 (2006) 253–260* Corresponding author. Tel.: +90 332 223 2056; fax: +90 332 241 0635. E-mail addresses:  kemal_polat2003@yahoo.com (K. Polat),sgunes@selcuk.edu.tr (S. Gu¨nes¸).1746-8094/$ – see front matter # 2006 Elsevier Ltd. All rights reserved.doi:10.1016/j.bspc.2006.11.001  makes decisions by evaluating the current test results of apatient and by referring to the previous decisions she/he madeon other patients with the same condition. The earlier methoddepends strongly on the physician’s knowledge. On the otherhand,thecurrentmethodsdependonthephysician’sexperienceto compare her/his patient with her/his earlier patients. This jobis not easy considering the number of factors she/he has toevaluate. In this crucial step, she/he may need an accurate toolthat lists her/his previous decisions on the patient having same(or close to same) factors.While a new artificial intelligence field named as artificialimmune systems (AIS) was emerging in late 1990s,performances of proposed methods were not so goodespecially for classification problems. However, artificialimmune recognition system (AIRS) proposed in 2001 haschanged this situation by taking attention among otherclassifiers with its performance [2]. The reason for its successin classification problems can be found in its properties,namely [3]. AIRS performs good on very different problemssuch as large dimensioned feature space problems, problemswithmanyclasses,etc.AIRSisself-adjustingwithregardtoitsarchitecture in problem space.Also, a lot of works in AIS have been done [4,5]. YanfeiZhong et al. proposed a novel unsupervised machine-learningalgorithm namely unsupervised artificial immune classifier(UAIC) to perform remote sensing image classification. Inaddition to their non-linear classification properties, UAICpossesses biological properties such as clonal selection,immune network, and immune memory. The implementationof UAIC comprises two steps: initially, the first clusteringcenters are acquired by randomly choosing from the inputremote sensing image. Then, the classification task is carriedout [4]. Kemal Polat et al. proposed the hybrid system based onAIRS and fuzzy weighted pre-processing method to diagnosethe heart disease [5]. The artificial intelligence systems such asANN (artificial neural network), SVM (support vectormachine), ANFIS (adaptive neuro-fuzzy inference system),and C4.5 decision tree classifier have been used to diagnose thevarious diseases.In this study, resource allocation of AIRS was changedwith its equivalence formed with fuzzy-logic to increase itsclassification performance by means of resource number andclassification time more than classification accuracy. Theeffects of this change were analysed in the applications usingmedical datasets and obtained classification accuracies werecompared with other classifiers used for same datasets.Fuzzy-AIRS has showed better performance than expectedand obtained the highest classification accuracy amongthe classifiers reported in UCI web site on this medicaldataset consisting of lymph diseases taken from UCIdatabase [6].The remaining of the paper is organized as follows. Wepresent the background in the next section. We present theproposed method in Section 3. In Section 4, we give the experimental data to show the effectiveness of our method.Finally, we conclude this paper in Section 5 with futuredirections. 2. Background 2.1. Lymph diseases diagnosis from lymphography data This lymphography database was obtained from theUniversity Medical Centre, Institute of Oncology, Ljubljana,Yugoslavia. There are 148 instances in total and there are nomissing attributes. There are 18 numeric valued attributes,which are listed as follows [6]:   Lymphatic—a test for the overall lymphatic system; andvalue 1 for normal.   Value 2 for arched, value 3 for deformed and value 4 fordisplaced.   Block of afferent—value 1 for no and value 2 for yes.   Block of lymph c—value 1 for no and value 2 for yes.   Block of lymph s—value 1 for no and value 2 for yes.   By pass—value 1 for no and value 2 for yes.   Extravasates—expel from a vessel and is represented by 1and 2.   Regeneration—value 1 for no and value 2 for yes.   Early uptake—value 1 for no and value 2 for yes.   Lymph nodes dimension, ranges from 0 to 3.   Lymph nodes enlarge, range from 1 to 4.   Changes in lymph—value 1 for bean, value 2 for oval andvalue 3 for round.   Defect in node—value 1 for no, value 2 for lacunars, value 3for lacunars.   Marginal and value 4 for lacunars central.   Changes in node—value 1 for no, value 2 for lacunars, value3 for lacunars.   Marginal and value 4 for lacunars central.   Changes in structure—the structure of the lymphatic system.   Special forms—value 1 for no, value 2 for chalices and value3 for vesicles.   Dislocation of node—value 1 for no and value 2 for yes.   Exclusion of node—value 1 for no and value 2 for yes.   Number of nodes, ranges from 0 to 80.There are four classes in the class variables: normal,metastases, malign lymph and fibrosis that are represented byinteger 1, 2, 3 and 4, respectively. 2.2. Natural and artificial immune systems The natural immune system is a distributed novel-patterndetection system with several functional components positionedin strategic locations throughout the body. Immune systemregulatesdefensemechanismofthebodybymeansofinnateandadaptive immune responses. Between these, adaptive immuneresponse is much more important for us because it containsmetaphors like recognition, memory acquisition, diversity, self-regulation,etc.Themainarchitectsofadaptiveimmuneresponseare lymphocytes, which divide into two classes as T and Blymphocytes (cells), each having its own function. Especially Bcellshaveagreatimportancebecauseoftheirsecretedantibodies(Abs) that take very critical roles in adaptive immune response. K. Polat, S. Gu¨ nes ¸  /Biomedical Signal Processing and Control 1 (2006) 253–260 254  The simplified working procedure of our immune system isillustrated in Fig. 1. Specialized antigen presenting cells(APCs) called macrophages circulates through the body and if they encounter an antigen, they ingest and fragment them intoantigenic peptides (I). The pieces of these peptides aredisplayed on the cell surface by major histocompatibilitycomplex (MHC) molecules existing in the digesting APC. Thepresented MHC-peptide combination on the cell surface isrecognized by the T-cells causing them to be activated (II).Activated T cells secrete some chemicals as alert signals toother units in response to this recognition (III). B cells, one of the units that take these signals from the T cells becomeactivated with the recognition of antigen by their antibodiesoccurred in the same time (IV). When activated, B cells turninto plasma cells that secrete bound antibodies on their surfaces(V). Secreted antibodies bind the existing antigens andneutralize them signaling other components of immune systemto destroy the antigen–antibody complex (VI) [7]. For detailedinformation about immune system refer to [7–9].Artificial immune systems emerged in the 1990s as a newcomputational research area. Artificial immune systems link several emerging computational fields inspired by biologicalbehavior such as artificial neural networks and artificial life.In the studies conducted in the field of AIS, B cell modelingis the most encountered representation type. Differentrepresentation methods have been proposed in that modeling.Among these, shape-space representation is the most com-monly used one [8].The shape-space model (S) aims at quantitatively describingthe interactions among antigens (Ags), the foreign elementsthat enter the body like microbe, etc., and antibodies (Ag–Ab).The set of features that characterize a molecule is called its generalized shape . The Ag–Ab representation (binary or real-valued) determines a distance measure to be used to calculatethe degree of interaction between these molecules. Mathema-tically, the generalized shape of a molecule ( m ), either anantibody or an antigen, can be represented by a set of coordinates  m  =  h m 1 ,  m 2 ,  m L i , which can be regarded as a pointin an  L  -dimensional real-valued shape-space ( m  2  S   L  ). In thiswork, we used real strings to represent the molecules. Antigensand antibodies were considered of same length  L  . The lengthand cell representation depend upon the problem [10,11]. 3. Artificial immune recognition system (AIRS)algorithm 3.1. The parameters in AIRS classifier  One of the important advantages of AIRS is that it is notnecessary to know the appropriate settings for the classifier inadvance. The most important feature of the classifier is its self-determination ability [2]. The explanations of each parameterused in AIRS are given below. Table 1 summarizes theseparameters.   Antigen : antigen is the same in representation as an antibody;however,thefeaturevector-classcombinationisreferredtoasan antigen when it is being presented to the ARBS forstimulation and/or response.   Mutation rate : a parameter between 0 and 1 that indicates theprobability that any given feature (or the output) of an  ARB will be mutated.   Affinity threshold  : affinity threshold (AT) is average affinityvalue among all of the antigens in the training set or among aselected subset of these training antigens.   Affinity threshold scalar   (  ATS  ): a value between 0 and 1 thatprovides a cut-off value for memory cell replacement in the  AIRS   training routine when multiplied by the  affinitythreshold  .   Stimulation threshold  : a parameter between 0 and 1 used as astopping criterion for the training on a specific  antigen .   Clonal rate : an integer value used to determine the number of mutated clones that a given  ARB  is allowed to attempt toproduce.   Number of resources : a parameter that limits the number of   ARB s allowed in the system. Each ARB is allocated to a Fig. 1. General immune response to invaders.Table 1Used parameters in AIRS and fuzzy-AIRS classifiers for lymph disease datasetParameters AIRS Fuzzy-AIRSMutation rate 0.1 0.1ATS (affinity threshold scalar) 0.2 0.2Stimulation threshold 0.98 0.98Clonal rate 10 10Hyper clonal rate 2.0 2.0Number of resources in AIRS 200 75 k   value for  k  -nearest neighbour 1 1 K. Polat, S. Gu¨ nes ¸  /Biomedical Signal Processing and Control 1 (2006) 253–260  255  number of resources based on its  stimulation value  and the clonal rate .   k nearest neighbor   ( KNN  ): a classification scheme in whichthe response of the classifier to a previously unseen item isdetermined by a majority vote among the  k   closest datapoints.   k value : the parameter that indicates how many  memory cells should be used to determine the classification of a given testitem. 3.2. AIRS classification algorithm AIRS is a resource limited supervised learning algorithminspired from immune metaphors. In this algorithm, the usedimmune mechanisms are resource competition, clonal selec-tion, affinity maturation and memory cell formation. Thefeature vectors presented for training and test are named asantigens while the system units are called as B cells. Similar Bcells are represented with artificial recognition balls (ARBs)and these ARBs compete with each other for a fixed resourcenumber. This provides ARBs, which have higher affinities tothe training Antigentoimprove.Thememorycellsformedafterthe whole training antigens were presented are used to classifytest antigens. The algorithm is composed of four main stages,which are initialization, memory cell identification and ARBgeneration, competition for resources and development of acandidate memory cell, and memory cell introduction. Table 2summarizes the mapping between the immune system andAIRS. The flow chart of AIRS algorithm is shown in Fig. 2.We give the details of our algorithm below.1.  Initialization : createasetofcellscalledthememorypool(  M  )and the ARB pool ( P ) from randomly selected training data.2.  Antigenic presentation : for each antigenic pattern do:(a)  Clonal expansion : for each element of   M  , determine itsaffinitytotheantigenicpattern,whichresidesinthesameclass. Select the highest affinity memory cell (mc) andclone mc in proportion to its antigenic affinity to add tothe set of ARBs ( P ).(b)  Affinity maturation : mutate each ARB descendant of thehighest affinity mc. Place each mutated ARB into  P .(c)  Metadynamics of ARBs : process each ARB using theresource allocation mechanism. This process will resultin some ARB death, and ultimately controls thepopulation. Calculate the average stimulation for eachARB, and check for termination condition.(d)  Clonal expansion and affinity maturation : clone andmutate the randomly selected subset of the ARBs left in P  based on their stimulation level.(e)  Cycle : while the average stimulation value of each ARBclass group is less than a given stimulation threshold goto step 2c.(f)  Metadynamics of memory cells : selectthehighestaffinityARBofthesameclassastheantigenfromthelastantigenicinteraction. If the affinity of this ARB with the antigenicpattern is better than that of the previously identified bestmemorycell mc then add the candidate (mc-candidate) tomemory set  M  . If the affinity of mc and mc-candidate arebelow the affinity threshold, remove mc from  M  .3.  Classify : classify data items using the memory set  M  .Classification is performed in a  k  -nearest neighbor fashionwith a vote being made among the  k   closest memory cells tothe given data item being classified.We can characterize AIRS as follows:   Memory : the memory of the AIRS algorithm is in the pool of memorycellsdevelopedthroughexposuretothetrainingdata(experiences). Table 2Mapping between the immune system and AIRSImmune system AIRSAntibody Feature vectorRecognition ball Combination of feature vector and vector classShape-space Type and possible values of the data vectorClonal expansion Reproduction of ARBs that are well matched antigensAntigens Training dataAffinity maturation Random mutation of ARB and removal of the leaststimulated ARBsImmune memory Memory set of mutated ARBsMetadynamics Continual removal and creation of ARBs andmemory cellsFig. 2. The flow chart of AIRS. K. Polat, S. Gu¨ nes ¸  /Biomedical Signal Processing and Control 1 (2006) 253–260 256    Adaptation : the adaptation occurs primarily in the ARB pool.With each new experience, AIRS evolves a candidatememory cell in reaction to this experience. If this memorycell is of sufficient quality, then the memory structure isadapted to include in it.   Decision-making : the initial decision is which memory cell isthe most similar to the incoming training antigen. This cell isused as a progenitor for a pool of evolving cells. Duringclassification, the primary classification decision is madebased on the  k   most similar memory cells to the data itembeing classified.We explain each step of our algorithm in detail in the followingparagraphs. 3.2.1. Initialization The first step of the algorithm is the data pre-processingstage.Inthisstep,thegivendataisnormalizedtoensurethattheEuclidean distance between two data is in the interval of [0–1]. 3.2.2. Memory cell identification and ARB generation In this step, the algorithm iterates for each training antigen.Training antigen is presented to memory cells and the moststimulated memory cell by that antigen is cloned. Thestimulation levels are calculated by Eq. (1). All of the cloneswithmemorycellareaddedtoARBpool.Thenumberofclonesis determined based on the affinity between the memory celland the antigen. Calculation of the affinity values is done byusing Eq. (2), which results in higher affinities for lowerEuclidean distancesstimulation ð  x ;  y Þ¼  affinity ð  x ;  y Þ ;  if classof   x  ¼  classof   x 1    affinity ð  x ;  y Þ ;  otherwise   (1)affinity ð  x ;  y Þ ¼  1    Euclideandistance ð  x ;  y Þ¼  1    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiX ni ¼ 1 ð  x i    y i Þ 2 s   (2) 3.2.3. Competition for resources and development of acandidate memory cell In this step, the training antigen is presented to all ARBs inARB pool. Then, all ARBs are awarded based on their resourcenumbers. An ARB class with higher number of resources getshigher affinity values. In other words, the assigned affinityvalues are proportional to the number of resources for eachARB. The required number of resources may exceed thenumber of resources allowed by the system. In this case, theadditional resources are removed beginning with the lowestaffinity ARB until the number of resources is equal to theallowed number of resources. The stimulation levels of remaining ARBs are tested and the average value of theselevels is determined for each class. If any of these averagevalues is lower than a stimulation threshold determined by theuser, the ARBs belonging to that class are mutated and resultedclones are added to the ARB pool. Fig. 3 shows the mutationroutine in AIRS. This step proceeds until the averagestimulation of all classes is bigger than the stimulationthreshold. To calculate the average stimulation value for eachclass, we use the following expression: s i   P j ARB i j  j ¼ 1  arb  j : stim j ARB i j  ;  arb  j  2 ARB i  (3)where  i  = 1    nc ,  s  = { s 1 ,  s 2 ,  . . . ,  s nc },  j ARB i j  represents thenumber ofARBs belongingto i thclass, and arb  j .stim representsthe stimulation level of   j th ARB of   i th class. 3.2.4. Memory cell introduction After the total stimulation value of ARBs in all classesreaches stimulation threshold, the best ARB (i.e., ARB havingthe highest affinity) in the same class with training antigen istaken as a candidate memory cell. If the stimulation valuebetween training antigen and the candidate memory cell isbigger than the stimulation value between training antigen andoriginal memory cell selected for cloning in step 2, thecandidate memory cell is added to the memory cell pool.These steps are repeated for each training antigen. Aftertraining, test data are presented only to memory cells.  k  -NNalgorithm is used to determine the classes in test phase. Formore detailed information about AIRS, the reader is referred to[2,7,14]. 3.3. Fuzzy resource allocation mechanism The competition of resources in AIRS allows high-affinityARBs to improve. According to this resource allocationmechanism, half of resources are allocated to the ARBs in theclass of Antigen while the remaining half is distributed to theother classes. The distribution of resources is done according toa number that is found by multiplying stimulation rate withclonal rate. In the study of Baurav Marwah and Lois Boggess, adifferent resource allocation mechanism was tried [12]. In theirmechanism, the Ag classes occurring more frequently get more Fig. 3. The mutation routine. K. Polat, S. Gu¨ nes ¸  /Biomedical Signal Processing and Control 1 (2006) 253–260  257
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks