Self Improvement

21 pages

Frailty effects in networks: comparison and identification of individual heterogeneity versus preferential attachment in evolving networks

of 21
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Frailty effects in networks: comparison and identification of individual heterogeneity versus preferential attachment in evolving networks
  Journal compilation  ©  2011 Royal Statistical Society 0035–9254/11/60239 Appl.Statist. (2011) 60 ,  Part   2 ,  pp. 239–259 Frailty effects in networks: comparison andidentification of individual heterogeneity  versus  preferential attachment in evolving networks Birgitte Freiesleben de Blasio, Taral Guldahl Seierstad and Odd O.Aalen University of Oslo, Norway  [Received April 2010.Final revision August 2010] Summary.  Preferential attachment is a proportionate growth process in networks, where nodesreceive new links in proportion to their current degree.Preferential attachment is a popular gen-erativemechanismtoexplainthewidespreadobservationofpower-law-distributednetworks.Analternative explanation for the phenomenon is a randomly grown network with large individualvariation in growth rates among the nodes (frailty). We derive analytically the distribution ofindividual rates, which will reproduce the connectivity distribution that is obtained from a gen-eral preferential attachment process (Yule process), and the structural differences between thetwo types of graphs are examined by simulations. We present a statistical test to distinguishthe two generative mechanisms from each other and we apply the test to both simulated dataand two real data sets of scientific citation and sexual partner networks.The findings from thelatter analyses argue for frailty effects as an important mechanism underlying the dynamics ofcomplex networks. Keywords  : Citation network; Frailty; Network model; Power law network; Preferentialattachment; Sexual partner network;Yule distribution 1. Introduction Our aim in this paper is to study the mechanisms behind the growth of large networks. Themodels that we consider are random-graph processes where nodes and links are added to thegraph, but without removal of either nodes or links.In recent years, empirical data on large-scale networks have become available and haveprovided valuable insights into the structure of real world connected systems. In many complexnetworks it is found that the degrees of the nodes follow a power law distribution. In an earlyreferenceinthisfielddeSollaPrice(1976)foundthatthenumberofcitationsofscientificarticlesfollows a power law, and he showed that this power law can be explained if one assumes thatthe probability that a new publication cites a given article is proportional to the number of citations. This was taken up by Barabási and Albert (1999), who proposed a similar model forthe World Wide Web, and who coined the name ‘preferential attachment’ for this mechanism.Today,mostexistingmodelsthatareaimedatreproducingpowerlawgraphsincorporateapref-erentialattachmentmechanismofsomekind;numerousexamplescanbefoundinDorogovtsevand Mendes (2003). However, already in 1925, Yule had published a closely related stochastic Address for correspondence : Birgitte Freiesleben de Blasio, Department of Biostatistics, Institute of BasicMedical Sciences, University of Oslo, PO Box 1122 Blindern, N-0317 Oslo, Norway. E-mail: Reuse of this article is permitted in accordance with the terms and conditions set out at .  240 Blasio, T.G.Seierstad and O.O.Aalen  branching model. The Yule process is general and has been adapted to networks; see Krapivsky et al.  (2000), Newman (2005) and Dorogovtsev  et al.  (2000).Jeong  et al.  (2003) proposed a method to identify a preferential attachment process. The ideais to estimate, for every  k  , the intensity  Π .k/  by which nodes of degree  k   acquire new linksduring a small time interval ∆ t  . The preferential attachment hypothesis predicts that the rate Π .k/  is a monotonically increasing function of   k  , and Jeong  et al.  (2003) found that this isindeed so for several real world networks. In other words there is a correlation between a node’sattractiveness and its popularity, where we interpret the  attractiveness  of a node as its rate of acquiring new links, and the  popularity  of a node as the number of links that it has alreadyacquired.However,thepreferentialattachmenthypothesisactuallymakesastrongerprediction,namelythat a node is attractive  because  it is popular, i.e. a node with high degree attracts new links ata higher rate than other nodes  because  it has high degree. Despite the popularity of the prefer-ential attachment hypothesis in explaining the structure of complex networks, it seems naturalalso to investigate the opposite relationship: a node may have some intrinsic quality causingit to acquire links at a higher rate than other nodes. Such nodes will in general end up withhigher degree than less attractive nodes; thus, in this setting we shall also find an increasing Π .k/ , although the growth mechanism is different from preferential attachment.In some networks a frailty mechanism provides a seemingly reasonable explanation of theconnectivity distribution. For example in a sexual network the degree of a node (i.e. the numberof previous sexual partners) is not displayed and cannot be used directly as a selection criterion.Likewise, research papers with a high citation count are likely to have some merit, and are, onewould hope, cited again because of their quality, and not merely because of their previous pop-ularity. However, papers with many citations are more likely to be found in a literature searchby a potential citer and are also for this reason more likely to be cited. It seems reasonable thatin many real networks there is a mixed mechanism, such that a node’s attractiveness increaseswith its popularity, but such that there still is heterogeneity between the nodes, which is notdirectly related to their degrees.In this paper we shall study a random-graph model which evolves by a mechanism whichmimics this behaviour, and we shall investigate statistical methods by which we can distinguishsuch a process from a pure preferential attachment process. The random-graph model that wepropose we call a  frailty graph . At birth every node  v  is assigned a  frailty  Z v , which is distrib-uted according to some probability distribution  Z  . Frailty is a term that is used in event historyanalysis to describe unobserved heterogeneity in data. The random graph then grows in such away that existing nodes acquire new links with a rate that is proportional to its assigned frailtyvariable. We shall assume that we cannot observe the frailty Z v  for a given node v ; however, it ispossible to infer properties about the distribution of   Z   from the behaviour of the random-graphprocess.We can consider a preferential attachment process and the frailty graph process as two ex-tremes: in the preferential attachment process nodes are equal at the outset, and their rate of acquiring new links depends solely on their degree. Nodes which enter the graph at a latertime, or nodes that are unlucky in the beginning and acquire few links, will have little chanceof overtaking nodes with an early success. In the frailty graph process, however, there is aninherent unevenness between the nodes from the beginning and, even if a node with unfavour-able frailty is lucky and receives several links in one time period, this will not affect its abilityto attract new links in another time period. Furthermore, entering the graph at a later timeis not an obstacle to becoming a successful node, as long as the node has a favourable frailtyvariable.  Frailty Effects in Networks   241 The method to identify preferential attachment, which was proposed by Jeong  et al.  (2003),considers the degree distribution of the network at two relatively close observation timesand estimates the link acquiring rate of a node as a function of its degree. As explainedabove, this method does not distinguish a network evolving by the preferential attachmentmechanism from a frailty network. As an alternative we propose methods which make use of three observation times,  t  1 ,  t  2  and  t  3 , or two observation times combined with information onage.A neat property of the preferential attachment mechanism is that it automatically leads tographs where the degrees follow a power law distribution, provided that the rate with which anode acquires new links is growing linearly with its degree. A power law distribution has beenobserved in many real world complex networks; for this reason the preferential attachmenthypothesis is an appealing explanation. In the frailty model we obtain a power law distributiononly if the frailty variable itself is power law distributed; thus, a power law may be more difficultto rationalize if we assume that a frailty process is the underlying mechanism of the network.However,powerlawsareobservedinmanysituationsintherealworld,e.g.insocialinteractionsas pointed out by Zipf (1949), and may be caused by several mechanisms (Newman, 2005), so itis not so far fetched to assume that such a frailty variable may have a power law distribution. Inthis paper we derive a distribution for the frailty variable which ensures that the graph processasymptotically has the same degree distribution as the Yule process and thus follows a powerlaw.We shall also consider two real world networks, namely citation data for a certain set of mathematical research papers, and the sexual network of a subset of the Norwegian popula-tion. Much work has been done on the study of sexual networks, since this is relevant withregard to the spread of sexually transmitted infections. In a study of homosexual men attendinga sexually transmitted infections clinic, Colgate  et al.  (1989) observed that the sexual contactsfollow a power law degree distribution. This finding was supported by Liljeros  et al.  (2001), onthebasisofanalysesofdatafromapopulation-basedsexualsurveyinSweden,andlaterLiljeros et al.  (2003) suggested that preferential attachment could be of relevance for sexual networkgrowth. However, these issues have been the subject of some controversy, and existing datasamples are too small to verify scaling behaviour for more than 1–2 decades. In an attempt tofind growth models for sexual networks, Jones and Handcock (2003) and Handcock and Jones(2004) noted that heterogeneity between the nodes for forming new links is the mechanism thatis best supported by data, and they concluded that ‘a unitary behavioural process, such as preferential attachment, is unlikely to underlie empirical sexualnetwork degree distributions’. Moreover, they found that, although the sexual network in some portions of some societies arewell fitted by a power law, a better fit is generally obtained by a negative binomial distributionand variants thereof. Similarly, Hamilton  et al.  (2008) found that, when compared with alter-native models, the power law hypothesis fails to have consistent support. In a different setting,Stephen and Toubia (2009) compared preferential attachment with other possible mechanismsto find an explanation for the process of link addition in a certain social commerce network.They found that the mechanism which fits the data best is a mechanism that is based on vertexattributes, akin to a frailty effect.Real networks are governed by complex social, behavioural and evolutionary dynamics andthe frailty and preferential attachment graph models that are considered here are idealizationsthat involve severe reduction of that complexity. It is probable that empirical networks mayevolve by a combination of the two mechanisms in addition to time inhomogeneous growth. In  242 Blasio, T.G.Seierstad and O.O.Aalen  the first part of the paper we focus on the two ‘pure’ graph models and investigate the theoreti-cal differences between them; then we shall consider some data examples and discuss how theyrelate to the two models. 2. Frailty power law distribution Ourobjectiveistocomparefrailtynetworkswithnetworksevolvingbypreferentialattachment.For this, we need to find the probability distribution of the frailty variable  Z  , which ensuresthat therandomfrailtygraphhasthesameasymptoticbehaviouras thepreferentialattachmentgraph. As already mentioned, networks growing by the preferential attachment mechanismautomatically have a degree distribution which is asymptotically scale free, provided that the‘preference function’  Π .k/  is linear in  k  . The example that we shall consider here is a generalpreferential attachment process which was proposed by Yule (1925).In this section, we start by deriving analytically the probability distribution of the frailtyvariable  Z  , which ensures that the frailty graph is asymptotically similar to the Yule graph, andweshallconsiderfinitesizeeffectsofthescalingfunction.Inthesubsequentsectionsweprovidemethods for testing for preferential attachment in networks, and we compare the two types of graphs by using simulations. Lastly, we shall apply the test to real data in terms of citation andsexual partner networks. 2.1. Graph models  The implementation of the graph processes in the simulations is as follows: we start with somefixed graph  G 0  and let  G m  be the graph after  m  steps in the process. To every node  v  in  G m , weassociate a probability  p v , m , such that, for every  m  0, Σ v ∈ G m p v , m = 1. In the Yule process theprobability that is associated with a node is proportional to its degree; in the frailty process, itis proportional to an unobserved frailty variable that is assigned to the node at birth.At every step in the graph process, one of the following two actions is performed. Withprobability  p node , a new node is added to the graph. It is then attached to one of the existingnodes, in such a way that every node  v  has probability  p v , m  of being chosen. With probability p link = 1 − p node  a new link is added between existing nodes. The srcin is chosen uniformly atrandom; the target node is chosen according to the probabilities  p v , m .The difference between the Yule process and the frailty process is the way in which the prob-abilities p v , m  are calculated. Let d  v , m  be the degree of  v in G m , and let s m  be the total number of links in the graph. Then, in the Yule process,  p v , m = d  v , m = 2 s m . In the frailty process, whenevera node arrives it is assigned a  frailty  value  z v  which is chosen according to the probability dis-tribution of a given positive random variable  Z  . The probabilities  p v , m  are then given by theequation  p v , m = z v = Σ w ∈ G m z w .Thus,intheYuleprocess,anodeischosenwithprobabilityproportionaltoitsdegree,whereas,in the frailty process, a node is chosen with probability proportional to its frailty value.We let  p link = h=.h + 1 /  and  p node = 1 =.h + 1 / , where  h  is a parameter indicating the expectednumber of links added between the addition of two consecutive nodes. 2.2. Yule distribution  TheYuledistributionhasitsorigininamutationalevolutionarymodel;seeSimon(1955),Willis(1922) and Yule (1925). It has been adapted to networks in different ways; here we follow thederivation by Newman (2005). Consider a growing network where new nodes arrive consecu-  Frailty Effects in Networks   243 tivelyateachtimestep.Atthepointofarrival,newnodesmake j  0  linkstoexistingnodes,where j  0  may take the values j  0 = 0,1,2,.... If  j  0 = 0, the model requires an additional ‘attractiveness’parameter a> 0 for the new nodes to engage in the preferential attachment process. Between thearrival of new nodes, a constant of   m  links are formed between the existing nodes;  m  may takethevalues m = 0,1,2,...,thougheither j  0  or m mustbeapositiveintegerforlinkstoform.Herethe target ends are chosen according to their present degree  j  i  with  p i = j  i = Σ k j  k . In principle,at time  t  = 0 the system starts with two nodes connected by  m  links, so at time  t  = t  ′ there are m + .m + j  0 /t  ′ links in a system of size N  = 2 + t  ′ nodes. Newman (2005) has shown with the useof a master equation technique that the asymptotic distribution of the degree  J   of a randomlychosen node is given bylim N  →∞ { p Y  .J  = j/ } = B.j  + a , q/B.j  0 + a , q − 1 / = .q − 1 / Γ .j  + a/ Γ .j  0 + a + q − 1 / Γ .j  0 + a/ Γ .j  + a + q/ ,  . 1 / where  q = 2 + .j  0 + a/=m  is the scaling constantlim N  →∞ { p Y  .J  = j/ } = C Γ .j  + a/ Γ .j  + a + q/ ∝ j  − q ,  j  →∞ : . 2 / The Barabási–Albert preferential attachment model where new nodes enter and link to  m  exist-ing nodes chosen with linear preference arises as a special case for  m = a ; j  0 = 0 and has thescaling property  p.j/ ∝ j  − 3 . 2.3. Frailty graph  We aim to find the frailty probability density function (PDF) of a randomly grown graph thathas a probability distribution function that is similar to the Yule graph equation (2). For thiswe note that the degree of any single node is a random variable which is asymptotically Poissondistributed. We first seek an expression for the probability-generating function of the degreesof the random graph. Then we can recover the frailty rate distribution from the inverse Laplacetransform of the generating function.We consider a random graph with  N   nodes, where  N  →∞ . We assume that the  i  th node isassigned a frailty variable  Z i , where  Z 1 , Z 2 ,..., Z N   are independent random variables whichhave the same distribution as a given positive random variable  Z  . The expected degree of agiven node  i   is proportional to its frailty variable  Z i ; however, it also depends on the age of the node, as older nodes will generally have a larger degree than younger nodes. Below we shalldefine a random variable  Y   which is a ‘time-adjusted’ frailty variable, taking account of boththe inherent frailty variable of a node and its age.After  m  steps of the process, the sum of the frailty variables of all the nodes in the graph willbe roughly  m µ p node , where  µ = E .Z/ . When we consider the asymptotic case, the error in thisapproximation is of lower order  O.m/  and can be ignored. If node  i   was added at step  m i , thenthe degree of   i   after  m  steps will be approximately Poisson distributed with mean   Z i µ p node + p link  ln   mm i  : Let  ˜ Z = Z= µ p node + p link  and  γ  m .m i / = ln .m=m i / , and let  ˜ f.z/  be the PDF of   ˜ Z .
Related Documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks