[Networkit] [NetworKit] Forest-Fire-Method

Michael Hamann michael.hamann at kit.edu
Tue Oct 28 15:16:35 CET 2014


Hello everybody,

I've just looked at the Forest-Fire generator. From what I've seen our 
implementation follows the algorithm from the paper very closely, even closer 
than the implementation in SNAP. However, unfortunately the backwards ratio 
was ignored which is why the backwards probability was always 0 which in turn 
explains why so few edges were generated. SNAP with backbwards probability 0 
generates similarily few edges.

I've changed this and also fixed the off-by-one error for the number of nodes. 
Now the version in the Dev branch produces graphs with similar numbers of 
edges as the SNAP implementation.

I'm not completely sure that the probability is right but it is definitely not 
completely wrong as the number of edges is now in the same range as the output 
from SNAP.

However there are two differences that I've found so far:
- we use a DFS-like approach for exploring nodes which is also what I 
interpret from the description in the paper. SNAP uses a BFS-like approach
- we use a backbwards probability ratio, this means the backwards probability 
is r*p, this is partially as described in the text in the paper. SNAP uses 
directly the backwards probability as parameter. The paper mentions a 
probability that is used for the number forward and backwards edges together 
instead.

I think the performance of our implementation could be further optimized by 
not using a std::set for storing the nodes that shall be burnt but directly 
burning them (as far as I can see by the construction of the directed graph, 
the set of forward and backward edges is always disjoint anyway).

I don't know if we shall adapt our implementation to the reference 
implementation or leave it as it is paper. Also the parameters are not very 
intuitive currently.

@Christian: Where is this non-dynamic version? I can't see any non-dynamic 
Forest-Fire algorithm.

Michael

Am Montag, 6. Oktober 2014, 18:42:30 schrieb Christian Staudt:
> Dear Wolfgang Schlauch,
> 
> > I tried the last weekend to send to the mailing list of NetworKit, but it
> > was automatically rejected by the system. Since you are one of the
> > responsible developers, I am sure you can give me advise on my problem or
> > can forward this mail to the correct correspondent.
> Sorry for the inconvenience. The list automatically rejects messages by
> non-subscribers due to spam, but you are at the moment properly subscribed.
> You should be able to post to the list. If the problem persists please let
> me know.
> > I tested last week a few things, including the Forest-Fire-Algorithm
> > (DynamicForestFireGenerator). From the name I guessed it is an
> > implementation of the ForestFire-algorithm of Leskovec, Kleinberg, and
> > Faloutsos.
> > 
> > I tried to use it via the Python-interface and got results that did not
> > agree with the expectations I had. For the Forest-Fire-Method, I usually
> > use SNAP (stanford network analysis platform) and the results are quite
> > different.
> The ForestFire generators (there is also a non-dynamic one named
> ForestFireGenerator) have not seen much use so far, so it could simply be
> an implementation error. Let us look into the issue and get back to you
> soon with a solution. Which version of NetworKit are you using?
> > If I read the code correctly, then I assume that C++ is not that hard to
> > learn and I will start implementing my stuff most likely with NetworKit
> > as well.
> You are very welcome to do so, and you can expect our developer team to
> provide some support. We try to build our API so that it is easy to express
> graph algorithms in C++. Many tasks for which performance is not critical
> can also be implemented in the Python layer. What are you currently working
> on?
> 
> Best regards,
> Christian Staudt
> 
> christian.staudt at kit.edu
> http://parco.iti.kit.edu/staudt/index-en.shtml
> Institute of Theoretical Informatics - Parallel Computing Group
> Building 50.34 Room 034
> Karlsruhe Institute of Technology (KIT)
> 
> Am 06.10.2014 um 11:53 schrieb Wolfgang Schlauch <schlauch at cs.uni-kl.de>:
> > Dear Mr. Staudt,
> > 
> > I tried the last weekend to send to the mailing list of NetworKit, but it
> > was automatically rejected by the system. Since you are one of the
> > responsible developers, I am sure you can give me advise on my problem or
> > can forward this mail to the correct correspondent.
> > 
> > I tested last week a few things, including the Forest-Fire-Algorithm
> > (DynamicForestFireGenerator). From the name I guessed it is an
> > implementation of the ForestFire-algorithm of Leskovec, Kleinberg, and
> > Faloutsos.
> > 
> > I tried to use it via the Python-interface and got results that did not
> > agree with the expectations I had. For the Forest-Fire-Method, I usually
> > use SNAP (stanford network analysis platform) and the results are quite
> > different.
> > 
> > Here is what I did in NetworKit:
> > ############################
> > import networkit
> > forestGenerator = networkit.generators.DynamicForestFireGenerator(0.38,
> > True, 0.34) # supposed to generate a directed graph with forward-burning
> > probability of 0.38, backward burning probability of 0.34
> > 
> > forestGraph = forestGenerator.generate(4039)
> > # generate a graph with 4039 nodes
> > # got one with 4040, but that is not the problem
> > 
> > graph = networkit.dynamic.graphFromStream(forestGraph)
> > # either I did not see it or this step was not documented
> > 
> > print graph.numberOfNodes(), graph.numberOfEdges()
> > # 4040, 6813
> > #############################
> > 
> > Now, the problem I have with this is the following: From the paper of
> > Leskovec, Kleinberg, and Faloustsos I know that the ForestFire-algorithm
> > is supposed to yield heavy-tailed degree sequences, communities, and
> > degree distributions following a power law. Looking into the
> > properties.overview it ... kind of looks ok-ish (i.e., degree power law
> > exponent = 1.49, degree distribution looked power-law-like). It did not
> > match the description of the shrinking diameter (19, 20) and somehow it
> > turned to something undirected (directed? 0).
> > 
> > 
> > But, using the SNAP-API to generate a ForestFire-Graph with the same
> > attributes (nNodes, pf, pb) yields something quite different:
> > 
> > #############################
> > import snap
> > ffGraph = snap.GenForestFire(4039, 0.38, 0.34)
> > print ffGraph.GetNodes(), ffGraph.GetEdges()
> > # 4039, 83979
> > #############################
> > 
> > Since the ForestFire-algorithm got implemented by one of the authors of
> > the paper (or it was under his supervision), I am going to assume that
> > the approach taken by SNAP is valid.
> > 
> > Then I started reading the implementation of both codes in C++ - I have to
> > admit that I never wrote serious C++ code, but I am quite willing to
> > learn it to implement my algorithms for NetworKit, I like the speed.
> > 
> > Now, as far as I understood the NetworKit-implementation:
> > - the auto-functions are somewhat anonymous functions with a trailing
> > return and the stuff behind the = is the data that it is given in
> > brackets but I still am not sure what the [&] is - I know that & is a
> > reference, but I have never seen the [] before. Please explain :-) - the
> > outer-most auto function does not have any return - since I am new to C++
> > that is okay for me, apparently the return type will be evaluated to
> > void? - the forward/backward neighbors give all not visited (in-)
> > neighbors of a node u - selectEdges samples edges randomly by selecting
> > neighbors as long as a random number generator draws a number smaller
> > than p, resp. r*p, or until the possible validEdges set is empty
> > 
> > This is what I got from reading only. I suppose that the last thing is
> > also where something goes amiss, since it should be (1-p)^(-1). resp.
> > (1-r*p)^(-1) (or p / (1-p) ...? unsure, I read it in two different
> > versions, but I believe the first one is correct).
> > 
> > Over the weekend I used my spare time to test this. Instead of supplying
> > the function with the actual probabilities, I provided the (1-p)^(-1),
> > and (1-r*p)^(-1),
> > 
> > A first test gave me more reasonable results with respect to my
> > expectations: same number of nodes, about 16000 edges.
> > 
> > The second try to generate a ForestFire-network failed already due to a
> > memory-error, which was confusing, but it might be I mistyped somewhere.
> > I did not write a script, which was in hindsight stupid.
> > 
> > A third try, since everything has to be verified, gave me than again a
> > quite different result, namely my 4040 nodes and over 8 million edges
> > (parameters: 1/(1-0.38), True, 1/(1-0.38*0.34)). Verification of the last
> > result in a fourth try took over 20 minutes until I just shrugged and
> > aborted.
> > 
> > After figuring out that I just have to follow through on the paper, I
> > think it might be quite helpful to write this in the documentation.
> > Moreover, with the changed parameters it took quite long. For comparison,
> > SNAP needed 0.04 seconds on average to generate a network with 4039 nodes
> > while NetworKit took between 1 minute (4040 nodes, about 6000 edges) and
> > 18 minutes (4040, 8 million) or I aborted the process since I did not
> > want to wait so long. With around 16k edges it took about 4 minutes. The
> > general problem that the properties.overview yielded an undirected
> > network is still there.
> > 
> > If I read the code correctly, then I assume that C++ is not that hard to
> > learn and I will start implementing my stuff most likely with NetworKit
> > as well.
> > 
> > If anyone can comment on this, I would be very glad.
> > 
> > Regards,
> > 
> > Wolfgang Schlauch




More information about the NetworKit mailing list