iDocSlide.Com

Free Online Documents. Like!

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Description

We study a gossip-based algorithm for searching data objects in a multipeer communication network. All of the nodes in the network are able to communicate with each other. There exists an initiator node that starts a round of searches by randomly

Tags

Transcript

a r X i v : 0 9 0 7 . 2 5 6 3 v 1 [ c s . N I ] 1 5 J u l 2 0 0 9
Gossip-based Search in Multipeer Communication Networks
Eva Jaho
†
, Ioannis Koukoutsidis
*
, Siyu Tang
‡
,Ioannis Stavrakakis
†
, and Piet Van Mieghem
‡†
National & Kapodistrian University of Athens, Dept. Informatics andTelecommunications, Ilissia, 157 84 Athens, Greece, E-mail:
{
ejaho,ioannis
}
@di.uoa.gr
*
University of Peloponnese, Dept. Telecommunications Science and Technology, Endof Karaiskaki St., 22100, Tripolis, Greece, E-mail: gkouk@uop.gr
‡
Delft University of Technology, 2600 GA Delft, The Netherlands, E-mail:
{
S.Tang,P.F.A.VanMieghem
}
@tudelft.nlJune 21, 2009
Abstract
We study a gossip-based algorithm for searching data objects in a multipeer communicationnetwork. All of the nodes in the network are able to communicate with each other. Thereexists an initiator node that starts a round of searches by randomly querying one or more of its neighbours for a desired object. The queried nodes can also be activated and look for theobject. We examine several behavioural patterns of nodes with respect to their willingness tocooperate in the search. We derive mathematical models for the search process based on theballs and bins model, as well as known approximations for the rumour-spreading problem. Allmodels are validated with simulations. We also evaluate the performance of the algorithm andexamine the impact of search parameters.
1 Introduction
The term ‘gossiping algorithm’ encompasses any communication algorithm where messages be-tween two nodes are exchanged opportunistically, with the intervention of other nodes that act asbetweeners or forwarders of the message. It is inspired from the social sciences, in the same way asepidemic protocols where inspired from the spreading of infectuous diseases [1]. These two commu-nication paradigms are very similar, with diﬀerences focusing on the diﬀerent ways that gossipingnodes and infected nodes could behave: gossiping nodes adopt human-like characteristics, whilethe behaviour of infected nodes is governed by the dynamics of the virus or disease.Gossiping algorithms are suitable for communication in distributed systems, such as ad-hocnetworks and generally systems with peer-to-peer or peer-to-multipeer communication. The lattercommunication paradigm is followed here, where a node can communicate with multiple peers,usually maintaining a short-time connection with each one. Attractive characteristics of gossiping1
algorithms include simplicity, scalability and robustness to failures, as well as a speed of dissemi-nation that is easily conﬁgurable. Gossiping can be identiﬁed with the spreading of rumours in anetwork, the dymanics of which are investigated in [6, 5]. Additionally, gossiping protocols havebeen used for the computation of aggregate network quantities, such as sums, averages, or quantilesof certain node values [3].In all of the above referenced works, gossiping is typically used for the dissemination of in-formation, and performance metrics are oriented towards measuring the eﬃciency of informationdissemination. In this paper, we model a speciﬁc gossip-based algorithm that aims at ﬁnding ﬁles(or data objects, in general) in nodes in a distributed network. The algorithm employs sequentially-generated parallel search procedures, in the following manner: We assume there exists a ﬁle in thenetwork that may be located in diﬀerent nodes. An
initiator
node is interested in this ﬁle andstarts a round of searches to ﬁnd it, by randomly querying one or more of its peers (neighbours).The queried nodes can also be
activated
and look for the object. The search is considered successfulwhen at least one copy of the ﬁle is found.Apart from the initiator, the other nodes that assist the search can have diﬀerent behaviouralpatterns. We distinguish between
cooperative
and
non-cooperative
nodes. Nodes in the ﬁrst cate-gory always become active when queried, and generate themselves query messages in subsequentrounds. Non-cooperative nodes on the other hand are unwilling to participate in the search pro-cess themselves. We also consider
stiﬂer
nodes; the term is borrowed from [5] and signiﬁes nodesthat were previously active, but from a certain point on lose interest in the dissemination of thequery, and thus cease to participate in the search. Hence, it is a special case of cooperation. Toavoid confusion in the paper, non-stiﬂer nodes that are non-cooperative are also referred to as
plain non-cooperative
nodes. In all the above cases cooperation is considered only with respect toparticipating in the search; if a node has the ﬁle it always returns it.We also derive diﬀerent versions of the algorithm based on the level of knowledge that eachnode has about the progress of search. We consider two extremes: at the one, each node has noknowledge whatsoever about the number or identities of nodes that have been previously queriedin the network. At the other extreme, each node has complete knowledge about these facts andavoids sending messages to previously queried nodes at subsequent rounds. We call these cases
blind search
and
smart search
, respectively.We mathematically model the blind search process based on a known approximation for therumour spreading problem [6], which we extend here. The smart search process is modeled usinga combinatorial approach, based on a generalization of the balls and bins model [2]. Based onthese models, we are able to evaluate the performance of the search, as well as the impact of searchparameters. The latter are the number of queried neighbours by a node and the number of copiesof the ﬁle in the network. By changing these parameters, one can easily conﬁgure the speed andeﬃciency of the search, as will be shown later.Both the blind and smart versions of the gossiping algorithm have been modeled exactly in[8]. Both information dissemination and search are investigated in that paper, while here we arefocusing on the search process. The main algorithmic diﬀerence in [8] is that even the nodes thathave the ﬁle can be non-cooperative. In this paper, we want to focus only on the eﬀect thatcooperation has in the forwarding of the message: only the intermediate nodes forwarding a querycan be non-cooperative. In addition, the approximative model presented here for the blind searchprocess is shown to be computationally simpler, while maintaining good accuracy. Finally, in thispaper we present the stiﬂing behavioral pattern, which is not included in [8].2
The paper is structured as follows. In Section 2, the gossip-based search algorithm is describedin more detail, and application scenarios that justify the study of the blind and smart searchalgorithms are discussed. In Sections 3 and 4 we present the mathematical modeling of the blind
and smart search algorithms, respectively. The modeling in these sections covers cooperative andplain non-cooperative nodes. The stiﬂing behavioural pattern is analysed in Section 5. In Section 6,
we present results for the performance of the blind search algorithm for very large numbers of nodes,and derive useful scaling laws. The major conclusions from this work and issues for future researchare presented in Section 7.
2 Gossip-based search algorithm
A model of the network in the form of a complete graph is considered. There is an initiator node
I
,and
N
−
1 other nodes in the graph. There is a ﬁle
f
located in
m
of the other nodes of the graph(
m
≤
N
−
1) that the initiator wants to ﬁnd. The initiator starts a search by randomly queryinga subset of its neighbors of size
k
(
k
≤
N
−
1), with equal probabilities.If a queried node has the object then it returns it, and the query is successful. Otherwisethe queried nodes – depending on being cooperative or not – may begin to search themselves byforwarding the query to their neighbours. Cooperative nodes which are queried become “active”and participate in the search. The process of the search is modeled in steps or rounds, where ateach round all active nodes simultaneously query their neighbors, hence activating new nodes. Thealgorithm continues for several rounds where at each step, active nodes randomly query some of their neighboring nodes, until the ﬁle
f
is found.We consider two search scenarios:-
Blind search
: An active node searches “blindly” at each round, possibly querying nodesthat have been queried before. This approach can model devices with small computationalcapabilities, that cannot keep a log of queried nodes, or cases where the identities of thedevices are not known. It is equally appropriate to model situations with random encountersbetween nodes. For instance, a number of mobility models have exponential meeting timesbetween mobile nodes (such as the Random Walk, Random Waypoint and Random Directionmodels, as well as more realistic, synthetic models based on these [7]). In our model, the timeuntil a node is queried approaches a geometric distribution, which is the discrete time analogto an exponential distribution.-
Smart search
: An active node searches “smartly” at each round, by avoiding nodes that havebeen queried before either by itself or by other nodes. This demands the knowledge of theidentities of all queried nodes, and has a larger overhead compared to the blind search case.We do not deﬁne the exact algorithm by which the identities of all queried nodes are madeknown to an active node. We only assume that this knowledge can be obtained at a costthat is small compared to the cost of searching, and use this case mainly as a reference forthe eﬃciency of the blind search algorithm. It is evident that smart search corresponds tothe fastest version of the algorithm. Although it can be hard and costly to implement, therecan exist schemes that can approximate its performance. For example, a low-cost algorithmthat could approximate smart search can be based on the routine that, at each peer-to-peercommunication, nodes exchange the lists of peers they have queried.3
3 Approximate blind search model with cooperative or non-cooperative nodes
Each node that receives a search query will cooperate to forward the query with probability
c
(0
≤
c
≤
1). If several active nodes query the same node, the latter node decides whether to becooperative or not by a single Bernoulli trial. We do not consider that the node performs severalindependent trials, one for each query.We consider a sequence of steps (or rounds)
r
= 1
,
2
,...
until the ﬁle is found. If at step
r
thereare ˆ
A
(
r
) active nodes then, provided the ﬁle is not yet found, the probability of ﬁnding it at the
r
th step,
S
(
r
), is:
S
(
r
) = 1
−
(1
−
p
s
)
ˆ
A
(
r
)
,
(1)where
p
s
is the probability that a single search (consisting of
k
diﬀerent random queries) succeeds.To ﬁnd
p
s
, notice that the problem is equivalent to the one where, in a set of
N
−
1 nodes, thereare
m
marked nodes and we randomly select a group of
k
nodes. We want to ﬁnd the probabilitythat at least one marked node is selected. The probability that our selection returns exactly
u
marked nodes (
u
≤
min(
m,k
)) is
p
u
=
mu
N
−
1
−
mk
−
u
N
−
1
k
.
Indeed, the marked nodes can be chosen in
mu
diﬀerent ways, the unmarked ones in
N
−
1
−
mk
−
u
ways, and the total number of ways to select
k
nodes is
N
−
1
k
. Further,
p
s
= 1
−
p
0
, therefore
p
s
= 1
−
N
−
1
−
mk
N
−
1
k
.
(2)The probability of ﬁnding the ﬁle
f
at the
r
th step is,
p
(
r
) =
S
(
r
)
r
−
1
i
=1
(1
−
S
(
i
))
.
(3)This formula is an approximation because it implicitly assumes that each round is independent of the other.A deterministic approximation ˆ
A
(
r
) for the number of active nodes in each round can be foundusing the method presented in [6], which is extended to
k
neighbours that are cooperative withprobability
c
. Consider the process
{
I
(
r
)
, r
≥
1
}
of the number of inactive nodes in each round.Given that at round
r
there are
i
(
r
) inactive nodes and
A
(
r
) active ones, the mean number of inactive nodes at round
r
+ 1 will be
E
[
I
(
r
+ 1)] =
i
(
r
)
1
−
kN
−
1
A
(
r
)
+
1
−
1
−
kN
−
1
A
(
r
)
(1
−
c
)
=
i
(
r
)
(1
−
c
) +
c
1
−
kN
−
1
A
(
r
)
.
4
For ﬁxed
k
and large
N
, we use a second order expansion of (1
−
kN
−
1
)
A
(
r
)
, so that (1
−
kN
−
1
)
A
(
r
)
≈
e
−
A
(
r
)(
kN
−
1
+
k
22(
N
−
1)2
)
, and
E
[
I
(
r
+ 1)] =
i
(
r
)
(1
−
c
) +
ce
−
A
(
r
)(
kN
−
1
+
k
22(
N
−
1)2
)
.
From this, by assuming
I
(
r
) =
i
(
r
)
∀
r
, we derive the deterministic approximation
I
(
r
+ 1) =
I
(
r
)
(1
−
c
) +
ce
−
A
(
r
)(
kN
−
1
+
k
22(
N
−
1)2
)
.
(4)Using that
A
(
r
) =
N
−
I
(
r
), we ﬁnally obtain the recursion:
A
(
r
+ 1) =
Nc
+
A
(
r
)(1
−
c
)
−
(
N
−
A
(
r
))
ce
−
A
(
r
)(
kN
−
1
+
k
22(
N
−
1)2
)
,
(5)with
A
(1) = 1. Since
A
(
r
) is not an integer in general, we round it to the nearest integer, whichwe denote by ˆ
A
(
r
) = [
A
(
r
)].Following a similar approach as in [6], it can be shown that the distribution of
I
(
r
+ 1), given
i
(
r
) is indeed concentrated sharply around
i
(
r
)[(1
−
c
) +
c
exp(
−
A
(
r
)(
kN
−
1
+
k
2
2(
N
−
1)
2
))], and thatthe approximation becomes more accurate as
k/N
→
0.From (3), we have constructed an approximate distribution for the number of steps until theﬁle is found. We can then derive the mean number of steps until the ﬁle is found and the meannumber of nodes
A
involved in the search (activated nodes):
E
[
r
] =
∞
r
=1
rp
(
r
)
,
(6)
E
[
A
] =
∞
r
=1
ˆ
A
(
r
+ 1)
p
(
r
)
.
(7)(Notice that ˆ
A
(
r
+ 1) nodes will be activated approximately at round
r
.)For numerical calculations, as an upper bound on the support of
r
we take
r
max
= min
{
r
:
r
−
1
i
=1
(1
−
S
(
i
))
< ǫ
}
,
(8)where
ǫ
is a number close to zero.
Remark
1
.
In [8], we have derived an exact model for a slightly diﬀerent version of the blind searchalgorithm. Generally, the exact approach for modeling the search algorithm requires the calculationof the
N
×
N
transition matrix
Q
[
ij
]
, where the (
i,j
)-th value is the probability of going from
i
to
j
active nodes in one round. Then
Q
r
(1
,i
) denotes the probability that there are
i
active nodes in
r
rounds.The probability of ﬁnding the ﬁle in
r
rounds, denoted here by
B
(
r
), can be calculated as
B
(
r
) =
N
i
=1
1
−
N
−
im
N
−
1
m
Q
r
(1
,i
)
.
(9)5

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...Sign Now!

We are very appreciated for your Prompt Action!

x