ISCoDe
: a framework for interest similaritybasedcommunity detection in social networks
Eva Jaho, Merkouris Karaliopoulos and Ioannis Stavrakakis
Department of Informatics and TelecommunicationsNational & Kapodistrian University of AthensIlissia, 157 84 Athens, GreeceEmail: {ejaho, mkaralio, ioannis}@di.uoa.gr
Abstract
—This paper proposes a framework for node clustering in computerized social networks according to commoninterests. Communities in such networks are mainly formed byuser selection, which may be based on various factors such asacquaintance, social status, educational background. However,such selection may result in groups that have a low degree of similarity. The proposed framework could improve the effectiveness of these social networks by constructing clusters of nodeswith higher interest similarity, and thus maximize the beneﬁt thatusers extract from their participation. The framework is based onmethods for detecting communities over weighted graphs, wheregraph edge weights are deﬁned based on measures of similaritybetween nodes’ interests in certain thematic areas. The capacityof these measures to enhance the sensitivity and resolutionof community detection is evaluated with concrete benchmarkscenarios over synthetic networks. We also use the frameworkto assess the level of common interests among sample users of a popular online social application. Our results conﬁrm thatclusters formed by user selection have low degrees of similarity;our framework could, hence, be valuable in forming communitieswith higher coherence of interests.
I. I
NTRODUCTION
The term
community
denotes a social group of people thathave one or more things in common. Whether this is residence,geographical neighborhood, traditions, or interests and ideals,communities have been long attracting the interest of sociologists and psychologists thanks to their potential to motivate andshape human behavior. On the contrary,
virtual communities
have emerged more recently and, almost always, transcenddistance barriers. Empowered by the Internet, these onlinecommunities socialize in virtual spaces provided by socialnetworking sites. A major research question is then how couldthe dynamics of these virtual worlds be exploited for more efﬁcient design of networked communication protocols and whichfactors may shape the enduser (the network communicationsubject) behavior. It has been reported, for example, that highersimilarity in the interests/preferences of online social groupmembers favors collaborative, and even altruistic, behavior incontent replication [10] and content dissemination [1] scenarios. But is such similarity present in social networks, whereusers tend to select their friends/followers with very differentcriteria, including acquaintance, social status, educational andfamily background? To answer this question, we need to devisemechanisms and tools that can assess the similarity of interestsamong social group members and leverage the structure thissimilarity embeds in their social network.Our work in this paper addresses this requirement by poringover the interestbased community detection problem. Wepropose a framework, which we call “
ISCoDe
”, for assessingthe similarity in online social communities (Fig. 1). Input to
ISCoDe
are the interests of the communities’ member nodes incertain thematic areas, hereafter called “interest classes”, suchas music, sports, art. Each interest class could further be splitinto subcategories (
tags
). In section IV, we give an example of how the enduser interests can be inferred out of a real socialnetwork application.
ISCoDe
then proceeds in two steps. First,it quantiﬁes the interest similarity between node pairs throughthe use of interest similarity metrics. Outcome of this step isa weighted graph representation of the social network, withedge weights corresponding to the similarity metric values.In a second step,
ISCoDe
can invoke standard communitydetection algorithms for weighted graphs (for example, [14][4]) to group nodes into disjoint clusters, connected internallyby highweight edges and to other subsets’ nodes with smallor zeroweight edges. These algorithms also assess, in thesame time, the quality of this grouping through the modularitymetric [16].
SimilaritymetricsEdgeweightsWeightedcommunitydetectionClusters of
Input Output
User interestdistributions users withsimilar interest
Figure 1. The
ISCoDe
framework
We call
ISCoDe
a framework since there are more thanone options for its two main processing steps (namely, thederivation of the weighted graph edges and the communitydetection algorithm). Part of our work, hence, is devoted to theanalysis and assessment of these options. For the derivation of graph edge weights, we consider two metrics: the ProportionalSimilarity [20] and the inverse of the symmetrized KullbackLeibler divergence [12]. Effectively, each metric could be seenas a different
transformation
from one data set (distributionof user interests over interest classes) to another (graph edgeweights). Comparing the outcomes of
ISCoDe
under syntheticuser interest distributions, we show that the choice of thesimilarity metric affects both the sensitivity and the resolutionproperties of our framework. Note that the similarity metricswe consider are different from the similarity indices that only
This paper was presented as part of the Workshop on Network Science for Communication Networks (NetSciCom)
9781424499212/11/$26.00 ©2011 IEEE876
capture structural equivalence,
i.e.
, same proﬁle of relations toall other nodes in the network, such as the Pearson correlationand the Jaccard coefﬁcient [5].
Related work
: In the literature, algorithms for detectingcommunity structure have largely been applied to a givennetwork structure, usually modeled as a graph. The mostprominent algorithm thereof is that of Girvan and Newman[16], which is highly efﬁcient and overcomes many shortcomings of previously proposed algorithms, such as graphpartitioning (
e.g.
, spectral bisection [17], KernighanLin algorithm [11]) and hierarchical methods (
e.g.
, Euclidean distancesingle linkage clustering) [8]. These methods are not ideal foranalyzing general network data since usually it is not knownin advance in how many communities the network should besplit into and which is the best division. Newman furtherproposed a simple mapping from a weighted network to anunweighted multigraph and proposed an algorithm for detecting communities in weighted networks [14]. The graph edgeweights introduce another set of variables in the communitydetection process and it is shown in [7] that they can have biginﬂuence on the resulting community structure, especially ondense networks.
Contribution of this paper
: As in Newman’s approach,current practice in community detection consists in applying modularitymaximizing clustering algorithms over
given
(weighted) graphs. Our work has a different starting point.In our paper the network structure,
e.g.
, edge weight set, isnot given beforehand. It is rather generated by
ISCoDe
outof the distributions of user interests over different thematicareas, the only information we assume known and given tous. Since our framework uses the interest distributions as itsinput for community detection, in the same time it becomesa means of assessing the effectiveness of similarity metricsthat carry out the
mapping
of interest distribution differences.This paper, hence, explores how effectively different
mappings
facilitate the detection of the underlying interest similaritystructure when the “commodity” community detection algorithms are applied on their
images
,
i.e.
, the weighted edgesets they generate. Through the application of the framework and the presented results, the effectiveness of the proposedframework that advocates projecting distributional differencesto a weighted graph and using commodity approaches foridentifying communities thereof, is assessed and established.The paper proceeds as follows: Section II describes brieﬂythe scope and processing steps of the
ISCoDe
framework forinterestbased clustering. The evaluation methodology and experimental results are presented in Section III. An applicationto a real network is described in Section IV. Finally, wesummarize the major conclusions of the paper in Section Vand point to interesting problems for future work.II. T
HE
ISCoDe
FRAMEWORK FOR DETECTINGCOMMUNITIES OF NODES WITH SIMILAR INTERESTS
In general, we want
ISCoDe
to satisfy three main requirements:
Correctness:
The framework should be able to distinguishcorrectly existing community structure. Whereas it may notalways be possible to conclude whether such structure reallyexists, the outcome of the framework should at least agreewith our intuition in certain benchmark scenarios, where thisstructure is evident.
Sensitivity:
The framework should be able to adapt tochanges of the user interest distributions and reﬂect thestrength of the community structure.
Resolution:
The framework should be able to identifyimportant community structure irrespective of its scale andthe overall network size.We evaluate
ISCoDe
along these lines in section III. In therest of this section, we detail the two processing steps of theframework and present baseline choices for populating them.
A. From interest distributions to the weighted graph
Let
N
=
{
1
,
2
,...,N
}
be the set of the network nodes(online social network users) and
M
=
{
1
,
2
,...,M
}
the setof interest areas (classes). We assume that for each node
n
we can have an estimate of
F
n
, the probability distribution of its preferences over the
M
interest areas, which takes discretevalues
F
n
1
,F
n
2
,...,F
nM
with
∑
m
∈M
F
nm
= 1
. Practically,
F
nm
could be measured through the normalized request rate of node
n
for data objects (content) of type
m
or some other form of interest expression in a certain area (
e.g.
, subscription to thiscategory’s tags). In section IV, we describe this process for aparticular online social application.From the node interest distributions, we can then computethe pairwise similarity in the interests of two nodes drawingon measures of distributional similarity. Hereafter, we describeand focus on two of the possible choices: a) the ProportionalSimilarity (PS) metric, which is shown in [20] to satisfy
11
criteria suggested as suitable for a measure of similaritybetween distributions; and b) the inverse of KullbackLeiblersymmetrized divergence (InvKL) [12]. InvKL projects thedifference between two interest distributions to a signiﬁcantlybroader range of values compared to the PS metric,
i.e.
,
(0
,
+
∞
)
vs.
[0
,
1]
, thus shaping the resolution properties of the framework, as we will see later in Section III.
1) Proportional Similarity (PS) metric:
With the PS metric,the interest similarity
PS
F
i
,F
j
between two nodes
i
and
j
,with interest distributions
F
i
and
F
j
, equals [20]:
PS
F
i
,F
j
= 1
−
12
M
m
=1
F
im
−
F
jm
.
(1)
2) Inverse KL (InvKL) symmetrized divergence:
Our secondmetric is the inverse of the KullbackLeibler (KL) symmetrized divergence, a metric capturing the distance betweentwo distributions
InvKL
F
i
,F
j
=
M
m
=1
F
im
logF
im
F
jm
+
F
jm
logF
jm
F
im
−
1
.
(2)The InvKL metric takes values in
(0
,
+
∞
)
. The KL divergence goes to inﬁnity in cases where there is no interest
877
in one interest category from one node, whereas there isnonzero interest in it from another. In order to avoid suchproblems, smoothing methods (
e.g.
, interpolation and backingoff schemes) can be used. These have been studied in statisticallanguage modeling in order to estimate the distribution of natural language as accurately as possible [13]. In our casenonzero request rates for interest classes can be discountedwith different discounting methods (see [13]), whereas interestclasses for which there is no interest can be given a small
ϵ >
0
probability.
B. From weighted graphs to communities
Out of the full population of clustering algorithms, relevantto our objectives are those carrying out densitybased graphclustering [3]. Namely, they take as input a graph and partitionit in a way that some notion of density (in our case: the weightsof intracluster edges) is signiﬁcantly higher within a partitionthan across different partitions (intercluster edges). Withinthe complex networking community the defacto criterion forassessing the quality of the partitioning is modularity [14],[16]. Modularity sums across all partition clusters the fractionof withincluster edges minus the expected fraction of edgesthat would fall within the same cluster were they selected atrandom. For a given partition of a weighted graph
G
(
V,E
)
,where
V
is the set of network nodes and
E
the edge setcapturing pairwise interest similarities, modularity
Q
equals[14]
Q
=
C
c
=1
l
c
L
−
d
c
2
L
2
,
(3)where the sum is over the
C
communities of the partition,
L
isthe sum of the weights of all edges in the graph,
l
c
is the sumof weights over edges lying fully within community
c
, and
d
c
the respective sum over the full set of edges incident to nodesin
c
. Modularity takes values in the interval
[
−
1
/
2
,
1]
[2]. Itbecomes zero for community structures that do not differ thanwhat one would get by random chance, whereas values above
0
.
3
−
0
.
4
suggest strong community structure.Our framework lends to the use of different modularitymaximization algorithms. One example is the divisive clustering algorithm Newman proposed in [14] for weighted graphs.The algorithm iteratively removes from the srcinal graph theedge with the highest “edge betweenness” (deﬁned as thenumber of shortest paths between pairs of nodes traversingthe edge) and recalculates modularity and edge betweennessvalues till modularity does not increase any further. Thecomplexity of the algorithm is
O
(

E

2

V

)
, which for densegraphs yields
O
(

V

5
)
.More generally, the problem of ﬁnding a partition thatmaximizes modularity in general graphs has been formulated as an Integer Linear Program (ILP) and shown to beNPhard [2]. Proposed heuristic algorithms for modularitymaximization draw on simulated annealing [19] or extremaloptimization [6]. More commonly used and computationallyfriendlier, however, is the greedy agglomerative clusteringalgorithm of Clauset
et al.
[4], [15]. We simply extend it toweighted graphs by directly relating it with the deﬁnition of modularity in weighted graphs in (3). Initially each vertexis viewed as a discrete cluster of size one. The algorithmthen iteratively merges the two clusters that yield the largestmodularity increase. The algorithm completes in at most

V
−
1
steps and has an implementation cost of
O
(

V

2
log

V

)
[2] permitting scalability for large graph sizes. We retain thegreedy algorithm as the baseline for the assessment of
ISCoDe
in Section IIIA.III.
ISCoDe
EVALUATION
We work with synthetic networks of
N
member nodeswith
controllably
similar interests in order to evaluate thecorrectness, sensitivity, and resolution properties of the framework. With modularity as the ﬁtness metric of the detectedcommunity structure, structures featuring tighter communitieswith cleaner separation from each other should see higher
Q
values than equinumerous yet “looser" structures. Moreover,with respect to
ISCoDe
’s resolution, we recall the remarks byFortunato and Barthèlemy in [9] that algorithms seeking tomaximize modularity may fail to identify important structuressmaller than a scale. In concluding whether the identiﬁcationof further distinct communities within a single one is meaningful, we adopt the weak “community” condition by Radicchi[18],
i.e.
, a community
c
is correctly identiﬁed as one if
l
c
L
−
d
c
2
L
2
>
0
.
(4)Note that in
ISCoDe
the resulting modularity values aresigniﬁcantly affected by the choice of the similarity metric.Contrary to other studies in literature, where communitydetection algorithms maximizing modularity are studied ongiven complex weighted graphs,
ISCoDe
adds the additionaltransformation step of interests to graph edge weights. Therefore, another requirement from the evaluation process is toshow how the two interest similarity metrics affect the threeframework requirements.In the general setting, the network population is organizedinto
k
groups. Each group is interested in
M
, generallydifferent, interest classes, which are the same for all membernodes of a given group. We form
k
equalsize groups of
N/k
users: nodes
1
..N/k
are assigned to group
1
, nodes
N/k
+ 1
..
2
N/k
to group
2
, and so on (for the sake of theexample, we take
N/k
to be an integer). We then control thesimilarity within and across the
k
groups as follows:
Interest similarity
across
groups.
This is controlled intwo ways. Firstly, through the number of common interestareas between groups, which may take any value
r
in
[0
,M
]
.Secondly, and this relates to the way the similarity
within
a single group is controlled, through the way the interestsoverlap. We consider two scenarios for the distribution of common interests between two groups: a) the
r
commoninterest areas are simultaneously the
r
least interesting forgroup
g
and the
r
most interesting for group
g
+1
,
0
< g < k
(
L(ast)F(irst)
, Table I(a)); b) the
r
common interest areas arethe
r
most interesting for the users of all
k
groups (
F(irst)
878
F(irst)
, Table I(b)). These scenarios present two extreme casesregarding the interest similarity across groups. Given that thenumber of common interest areas and the distributions areheld ﬁxed, the LF(FF) scenario yields the smallest(highest)similarity.
Table IE
XAMPLE OF THE TWO INTEREST

OVERLAP SCENARIOS FOR
k
= 3
,
M
= 5
. T
HE ORDER OF INTEREST CLASSES MARKS ALSO THE ORDER OFNODES
’
INTERESTS WITHIN A GROUP
.(a) LF with a single overlap interest class (
r
= 1
)
Group
1
Group
2
Group
31 5 92 6 103 7 114 8 125 9 13
(b) FF with two overlap interestclasses (
r
= 2
)
Group
1
Group
2
Group
31 1 12 2 23 6 94 7 105 8 11
Interest similarity
within
groups.
The interests of nodeswithin a group are spread over the ordered
M
interest classesinline with the Zipf distribution
1
. The skewness parameter
s
of the distribution differs for each group node. The interest of the ﬁrst node of each group are uniformly distributed (
s
1
= 0
)and
s
increases with constant step
p
so that for node
n
,
s
n
=
p
(
n
−
1)
,p
∈R
. Higher
p
values increase the skewnessin the interest distribution and concentrate the node interests’mass in the higherorder interest classes. Interestingly, changesof
p
also affect the similarity of interests between nodes belonging to different groups depending on the overlap scenario(Table I): higher
p
values result in weaker (stronger) intergroup similarity in the
L
−
F
(
F
−
F
) overlap scenario.In summary, by calibrating
p
, the overlap scenario and thenumber of common interest classes, we can produce networkswith community structures of variable discernibility.
A. Experimental results and discussion
We show and discuss representative results from our experimentation with
ISCoDe
on synthetic networks that outline themain behavior of the framework. All experiments are carriedout with the greedy agglomeration algorithm in [4] since ityields signiﬁcantly faster run times than its competitors
2
.
1) Correctness and sensitivity experiments:
In this set of experiments,
N
= 80
and
k
= 4
. The impact of
M
wasfound to be minimal, thus we present herein results only for
M
= 20
. We vary the interest overlap scenarios (as in TableI), the number of common interest classes, one (Tables II(a),II(c)) or half of them (Tables II(b), II(d)), and the skewness of the interest distributions, smaller
p
values representing moreuniform distributions within a single group nodes.The ﬁrst remark is that both metrics produce the sameintuitive community partitions in the presence of strong community structure, as in Tables II(a) and II(c) for low
p
values.
1
Zipf distributions have been used in the recent past to capture preferencesfor web objects. Furthermore, they exhibit high modelling simplicity andﬂexibility, in that proper manipulation of their single parameter
s
, gives riseto a wide set of distributions ranging from uniform (
s
= 0
) to highly skewedones with powerlaw characteristics (
s >>
0
).
2
We run experiments also with the divisive clustering algorithm in [14] butwe had to restrict to small group sizes. In these cases, we obtained similarresults with respect to community structure and modularity values.
On the contrary, when such structure is not evident, the twometrics result in considerably different partitions.The second remark has to do with the higher sensitivity of the framework when the PS metric is used in its ﬁrst processing step. The modularity of the resulting partitions underthe PS metric decreases when the interests of nodes are morerandomly diffused over the different interest classes and goesdown to zero when the similarity structure tends to disappear,as in Table II(d). On the contrary, the resulting modularityvalues under the InvKL metric are almost insensitive to thechanges in the input interest distributions. With InvKL themodularity values are dominated by the edge weights betweenindividual node pairs; these tend to be very high (
≫
1
) forhighly similar nodes and very low for highly dissimilar nodes.Finally, as
p
increases, the interest distributions of most nodestend to be more concentrated on the ﬁrst group objects, andthe interest distributions become less uniform. For cases shownin Tables II(a) and II(b), this results in increasing modularityunder the PS metric, thanks to the decreasing weights of intergroup edges,
i.e.
, nodes initially assigned to different groups.It has the opposite effect for cases shown in Tables II(c) andII(d), where increasing
p
leads to stronger ties between nodesin different groups. On the contrary, InvKL does not adapt toany of these changes.
2) Resolution experiments:
We run two additional experiments focusing on the impact of the two similarity metricsupon the overall framework resolution. The ﬁrst experimentinvolves nodes with
highly similar interests
. All nodes areinterested in the same
M
objects, in the same order. Theydifferentiate only slightly in how their interests are spread overthe
M
interest classes, modelled by Zipf(s) distributions with
s
varying from 0 to its maximum value in steps of
p
= 0
.
01
.The second experiment involves nodes with
highly dissimilar interests
; there is a single common interest class betweensuccessively ordered nodes. The experiment resembles the LFoverlap scenario shown in Table I(a), if each group containedonly one node. The results from the two experiments arereported in Table III and clearly demonstrate the capacityof the two metrics to illuminate better different parts of theinterest similarity range.Mapping highly similar interest distributions to a muchbroader edge weight value range (Figure 2(a)), InvKL canresolve more communities than PS in the ﬁrst experiment,all of which satisfy Radicchi’s weak community condition of (4). On the contrary, PS tends to group small communitiestogether. Notably, the communities produced by both metricsdo not satisfy the inequality
l
c
<
√
2
L,
(5)which according to [9] suggests that community
c
may bethe combination of two or more smaller communities thatcannot simply be detected when pursuing the optimization of modularity due to their small size.The situation is reversed in the second experiment: it isnow PS that can recognize smaller communities, as shownin Table III(b). Moreover, (5) is satisﬁed, implying that there
879
Table IIC
ORRECTNESS AND SENSITIVITY EXPERIMENTS
: M
ODULARITY AND COMMUNITIES FORMED FOR DIFFERENT VALUES OF
p
,
N
= 80
,
M
= 20
(a) LF,
1
common objectPS InvKL
Q C
partition
Q C
partition
p
= 0
.
02 0
.
6849 4
{1..20}...{61..80}
0
.
7498 4
{1..20}...{61..80}
p
= 0
.
04 0
.
6925 4
{1..20}...{61..80}
0
.
7493 4
{1..20}...{61..80}
p
= 0
.
06 0
.
6992 4
{1..20}...{61..80}
0
.
7483 4
{1..20}...{61..80}
p
= 0
.
08 0
.
7048 4
{1..20}...{61..80}
0
.
7698 8
{1..10}...{71..80}
p
= 0
.
10 0
.
7095 4
{1..20}...{61..80}
0
.
7745 8
{1..10}...{71..80}(b) LF,
M/
2
common objectsPS InvKL
Q C
partition
Q C
partition
p
= 0
.
02 0
.
3594 2
{1..40} {41..80}
0
.
7667 4
{1..20}...{61..80}
p
= 0
.
04 0
.
3669 2
{1..40} {41..80}
0
.
7490 4
{1..20}...{61..80}
p
= 0
.
06 0
.
3756 2
{1..40} {41..80}
0
.
7475 4
{1..20}...{61..80}
p
= 0
.
08 0
.
3938 3
{1..20} {21..40} {41..80}
0
.
7687 8
{1..10}...{71..80}
p
= 0
.
10 0
.
4142 3
{1..20} {21..40} {41..80}
0
.
7730 8
{1..10}...{71..80}(c) FF,
1
common objectPS InvKL
Q C
partition
Q C
partition
p
= 0
.
02 0
.
5711 4
{1..20}...{61..80}
0
.
7498 4
{1..20}...{61..80}
p
= 0
.
04 0
.
5146 4
{1..20}...{61..80}
0
.
7492 4
{1..20}...{61..80}
p
= 0
.
06 0
.
4465 4
{1..20}...{61..80}
0
.
7480 4
{1..20}...{61..80}
p
= 0
.
08 0
.
3734 4
{1..20}...{61..80}
0
.
7692 8
{1..10}...{71..80}
p
= 0
.
10 0
.
3038 4
{1..20}...{61..80}
0
.
7730 8
{1..10}...{71..80}(d) FF,
M/
2
common objectsPS InvKL
Q C
partition
Q C
partition
p
= 0
.
02 0
.
1103 4
{1..20}...{61..80}
0
.
7496 4
{1..20}...{61..80}
p
= 0
.
04 0
.
0841 4
{1..20}...{61..80}
0
.
7481 4
{1..20}...{61..80}
p
= 0
.
06 0
.
0610 4
{1..20}...{61..80}
0
.
7444 4
{1..20}...{61..80}
p
= 0
.
08 0
.
0422 4
{1..20}...{61..80}
0
.
7611 8
{1..10}...{71..80}
p
= 0
.
10 0
.
0485 5
{1..15}...{61..75} {16..20,36..40,56..60,76..80}
0
.
7549 8
{1..10}...{71..80}Table IIIR
ESOLUTION EXPERIMENTS
: M
ODULARITY AND COMMUNITIES FORMED
,
N
= 80
,
M
= 20
(a) Similar nodesPS InvKL
Q C
partition
Q C
partition
0
.
0215 2
{1..38} {39..80}
0
.
6740 5
{1..14} {15..28} {29..44} {45..61} {62..80}(b) Dissimilar nodesPS InvKL
Q C
partition
Q C
partition
0
.
7860 10
{1..8}..{73..80}
0 1
{1..80}
0.97 0.975 0.98 0.985 0.99 0.995 1051015
weight
w e i g h t d i s t r i b u t i o n
PS
1020304050607080
10
4
10
5
10
6
10
7
020406080
InvKLweight
w e i g h t d i s t r i b u t i o n
1020304050607080
(a) Similar nodes
0 0.01 0.02 0.03 0.04 0.05020406080
weight
w e i g h t d i s t r i b u t i o n
PS
1020304050607080
0.0245 0.025 0.0255 0.026 0.0265 0.027020406080
weight
w e i g h t d i s t r i b u t i o n
InvKL
1020304050607080
(b) Dissimilar nodesFigure 2. Resolution experiments: Edge weight distributions
are more nondetected communities. InvKL, on the contrary,
cannot
since it squeezes all edge weight values that resultfrom the ﬁrst processing step within an interval of
0
.
012
width(Figure 2(b)).However, an important question is regarding the level of resolution,
i.e.
, in which cases communities should be moreresolved. Intuitively, it seems more important to identify ﬁnercommunity structure in a network with more similar nodes,than in case of dissimilar ones. Hence, the resolution advantage of InvKL at high similarity scenarios may overweigh itsdisadvantage at low similarity ones.IV. A
PPLICATION TO A REAL NETWORK
We apply
ISCoDe
to data traces extracted from the Deliciouswebsite (www.delicious.com). Delicious is a social bookmarking application where users can save all their web bookmarks(annotated with tags) online, share them with other users,and track what other users are bookmarking themselves. Each
880