[Networkit] [NetworKit] Forest-Fire-Method

Christian Staudt christian.staudt at kit.edu
Mon Oct 6 18:42:30 CEST 2014


Dear Wolfgang Schlauch,

> I tried the last weekend to send to the mailing list of NetworKit, but it was automatically rejected by the system. Since you are one of the responsible developers, I am sure you can give me advise on my problem or can forward this mail to the correct correspondent.

Sorry for the inconvenience. The list automatically rejects messages by non-subscribers due to spam, but you are at the moment properly subscribed. You should be able to post to the list. If the problem persists please let me know.

> I tested last week a few things, including the Forest-Fire-Algorithm (DynamicForestFireGenerator). From the name I guessed it is an implementation of the ForestFire-algorithm of Leskovec, Kleinberg, and Faloutsos. 
> 
> I tried to use it via the Python-interface and got results that did not agree with the expectations I had. For the Forest-Fire-Method, I usually use SNAP (stanford network analysis platform) and the results are quite different.

The ForestFire generators (there is also a non-dynamic one named ForestFireGenerator) have not seen much use so far, so it could simply be an implementation error. Let us look into the issue and get back to you soon with a solution. Which version of NetworKit are you using?

> If I read the code correctly, then I assume that C++ is not that hard to learn and I will start implementing my stuff most likely with NetworKit as well. 

You are very welcome to do so, and you can expect our developer team to provide some support. We try to build our API so that it is easy to express graph algorithms in C++. Many tasks for which performance is not critical can also be implemented in the Python layer. What are you currently working on?

Best regards,
Christian Staudt

christian.staudt at kit.edu
http://parco.iti.kit.edu/staudt/index-en.shtml
Institute of Theoretical Informatics - Parallel Computing Group 
Building 50.34 Room 034
Karlsruhe Institute of Technology (KIT)








Am 06.10.2014 um 11:53 schrieb Wolfgang Schlauch <schlauch at cs.uni-kl.de>:

> Dear Mr. Staudt,
> 
> I tried the last weekend to send to the mailing list of NetworKit, but it was automatically rejected by the system. Since you are one of the responsible developers, I am sure you can give me advise on my problem or can forward this mail to the correct correspondent.
> 
> I tested last week a few things, including the Forest-Fire-Algorithm (DynamicForestFireGenerator). From the name I guessed it is an implementation of the ForestFire-algorithm of Leskovec, Kleinberg, and Faloutsos. 
> 
> I tried to use it via the Python-interface and got results that did not agree with the expectations I had. For the Forest-Fire-Method, I usually use SNAP (stanford network analysis platform) and the results are quite different.
> 
> Here is what I did in NetworKit:
> ############################
> import networkit
> forestGenerator = networkit.generators.DynamicForestFireGenerator(0.38, True, 0.34)
> # supposed to generate a directed graph with forward-burning probability of 0.38, backward burning probability of 0.34
> 
> forestGraph = forestGenerator.generate(4039)
> # generate a graph with 4039 nodes
> # got one with 4040, but that is not the problem
> 
> graph = networkit.dynamic.graphFromStream(forestGraph)
> # either I did not see it or this step was not documented
> 
> print graph.numberOfNodes(), graph.numberOfEdges()
> # 4040, 6813
> #############################
> 
> Now, the problem I have with this is the following: From the paper of Leskovec, Kleinberg, and Faloustsos I know that the ForestFire-algorithm is supposed to yield heavy-tailed degree sequences, communities, and degree distributions following a power law. Looking into the properties.overview it ... kind of looks ok-ish (i.e., degree power law exponent = 1.49, degree distribution looked power-law-like). It did not match the description of the shrinking diameter (19, 20) and somehow it turned to something undirected (directed? 0).
> 
> 
> But, using the SNAP-API to generate a ForestFire-Graph with the same attributes (nNodes, pf, pb) yields something quite different:
> 
> #############################
> import snap
> ffGraph = snap.GenForestFire(4039, 0.38, 0.34)
> print ffGraph.GetNodes(), ffGraph.GetEdges()
> # 4039, 83979
> #############################
> 
> Since the ForestFire-algorithm got implemented by one of the authors of the paper (or it was under his supervision), I am going to assume that the approach taken by SNAP is valid.
> 
> Then I started reading the implementation of both codes in C++ - I have to admit that I never wrote serious C++ code, but I am quite willing to learn it to implement my algorithms for NetworKit, I like the speed.
> 
> Now, as far as I understood the NetworKit-implementation:
> - the auto-functions are somewhat anonymous functions with a trailing return and the stuff behind the = is the data that it is given in brackets but I still am not sure what the [&] is - I know that & is a reference, but I have never seen the [] before. Please explain :-)
> - the outer-most auto function does not have any return - since I am new to C++ that is okay for me, apparently the return type will be evaluated to void?
> - the forward/backward neighbors give all not visited (in-) neighbors of a node u
> - selectEdges samples edges randomly by selecting neighbors as long as a random number generator draws a number smaller than p, resp. r*p, or until the possible validEdges set is empty
> 
> This is what I got from reading only. I suppose that the last thing is also where something goes amiss, since it should be (1-p)^(-1). resp. (1-r*p)^(-1) (or p / (1-p) ...? unsure, I read it in two different versions, but I believe the first one is correct).
> 
> Over the weekend I used my spare time to test this. Instead of supplying the function with the actual probabilities, I provided the (1-p)^(-1), and (1-r*p)^(-1),
> 
> A first test gave me more reasonable results with respect to my expectations: same number of nodes, about 16000 edges.
> 
> The second try to generate a ForestFire-network failed already due to a memory-error, which was confusing, but it might be I mistyped somewhere. I did not write a script, which was in hindsight stupid. 
> 
> A third try, since everything has to be verified, gave me than again a quite different result, namely my 4040 nodes and over 8 million edges (parameters: 1/(1-0.38), True, 1/(1-0.38*0.34)). Verification of the last result in a fourth try took over 20 minutes until I just shrugged and aborted. 
> 
> After figuring out that I just have to follow through on the paper, I think it might be quite helpful to write this in the documentation. Moreover, with the changed parameters it took quite long. For comparison, SNAP needed 0.04 seconds on average to generate a network with 4039 nodes while NetworKit took between 1 minute (4040 nodes, about 6000 edges) and 18 minutes (4040, 8 million) or I aborted the process since I did not want to wait so long. With around 16k edges it took about 4 minutes. The general problem that the properties.overview yielded an undirected network is still there.
> 
> If I read the code correctly, then I assume that C++ is not that hard to learn and I will start implementing my stuff most likely with NetworKit as well. 
> 
> If anyone can comment on this, I would be very glad.
> 
> Regards,
> 
> Wolfgang Schlauch

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20141006/31e6616b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20141006/31e6616b/attachment.sig>


More information about the NetworKit mailing list