[Networkit] [NetworKit] Forest-Fire-Method
Christian Staudt
christian.staudt at kit.edu
Tue Oct 28 15:30:58 CET 2014
@Michael: Thanks for fixing this. The DynamicForestFireGenerator was a student project and so far not thoroughly tested.
@Wolfgang: Can you verify that the generator produces the results you expect now?
> @Christian: Where is this non-dynamic version? I can't see any non-dynamic
> Forest-Fire algorithm.
My mistake: There was an early implementation attempt, but it has been replaced by the dynamic generator.
Greetings from the IEEE BigData conference in Washington DC, where I gave a talk including NetworKit yesterday
Christian
Am 28.10.2014 um 10:16 schrieb Michael Hamann <michael.hamann at kit.edu>:
> Hello everybody,
>
> I've just looked at the Forest-Fire generator. From what I've seen our
> implementation follows the algorithm from the paper very closely, even closer
> than the implementation in SNAP. However, unfortunately the backwards ratio
> was ignored which is why the backwards probability was always 0 which in turn
> explains why so few edges were generated. SNAP with backbwards probability 0
> generates similarily few edges.
>
> I've changed this and also fixed the off-by-one error for the number of nodes.
> Now the version in the Dev branch produces graphs with similar numbers of
> edges as the SNAP implementation.
>
> I'm not completely sure that the probability is right but it is definitely not
> completely wrong as the number of edges is now in the same range as the output
> from SNAP.
>
> However there are two differences that I've found so far:
> - we use a DFS-like approach for exploring nodes which is also what I
> interpret from the description in the paper. SNAP uses a BFS-like approach
> - we use a backbwards probability ratio, this means the backwards probability
> is r*p, this is partially as described in the text in the paper. SNAP uses
> directly the backwards probability as parameter. The paper mentions a
> probability that is used for the number forward and backwards edges together
> instead.
>
> I think the performance of our implementation could be further optimized by
> not using a std::set for storing the nodes that shall be burnt but directly
> burning them (as far as I can see by the construction of the directed graph,
> the set of forward and backward edges is always disjoint anyway).
>
> I don't know if we shall adapt our implementation to the reference
> implementation or leave it as it is paper. Also the parameters are not very
> intuitive currently.
>
> @Christian: Where is this non-dynamic version? I can't see any non-dynamic
> Forest-Fire algorithm.
>
> Michael
>
> Am Montag, 6. Oktober 2014, 18:42:30 schrieb Christian Staudt:
>> Dear Wolfgang Schlauch,
>>
>>> I tried the last weekend to send to the mailing list of NetworKit, but it
>>> was automatically rejected by the system. Since you are one of the
>>> responsible developers, I am sure you can give me advise on my problem or
>>> can forward this mail to the correct correspondent.
>> Sorry for the inconvenience. The list automatically rejects messages by
>> non-subscribers due to spam, but you are at the moment properly subscribed.
>> You should be able to post to the list. If the problem persists please let
>> me know.
>>> I tested last week a few things, including the Forest-Fire-Algorithm
>>> (DynamicForestFireGenerator). From the name I guessed it is an
>>> implementation of the ForestFire-algorithm of Leskovec, Kleinberg, and
>>> Faloutsos.
>>>
>>> I tried to use it via the Python-interface and got results that did not
>>> agree with the expectations I had. For the Forest-Fire-Method, I usually
>>> use SNAP (stanford network analysis platform) and the results are quite
>>> different.
>> The ForestFire generators (there is also a non-dynamic one named
>> ForestFireGenerator) have not seen much use so far, so it could simply be
>> an implementation error. Let us look into the issue and get back to you
>> soon with a solution. Which version of NetworKit are you using?
>>> If I read the code correctly, then I assume that C++ is not that hard to
>>> learn and I will start implementing my stuff most likely with NetworKit
>>> as well.
>> You are very welcome to do so, and you can expect our developer team to
>> provide some support. We try to build our API so that it is easy to express
>> graph algorithms in C++. Many tasks for which performance is not critical
>> can also be implemented in the Python layer. What are you currently working
>> on?
>>
>> Best regards,
>> Christian Staudt
>>
>> christian.staudt at kit.edu
>> http://parco.iti.kit.edu/staudt/index-en.shtml
>> Institute of Theoretical Informatics - Parallel Computing Group
>> Building 50.34 Room 034
>> Karlsruhe Institute of Technology (KIT)
>>
>> Am 06.10.2014 um 11:53 schrieb Wolfgang Schlauch <schlauch at cs.uni-kl.de>:
>>> Dear Mr. Staudt,
>>>
>>> I tried the last weekend to send to the mailing list of NetworKit, but it
>>> was automatically rejected by the system. Since you are one of the
>>> responsible developers, I am sure you can give me advise on my problem or
>>> can forward this mail to the correct correspondent.
>>>
>>> I tested last week a few things, including the Forest-Fire-Algorithm
>>> (DynamicForestFireGenerator). From the name I guessed it is an
>>> implementation of the ForestFire-algorithm of Leskovec, Kleinberg, and
>>> Faloutsos.
>>>
>>> I tried to use it via the Python-interface and got results that did not
>>> agree with the expectations I had. For the Forest-Fire-Method, I usually
>>> use SNAP (stanford network analysis platform) and the results are quite
>>> different.
>>>
>>> Here is what I did in NetworKit:
>>> ############################
>>> import networkit
>>> forestGenerator = networkit.generators.DynamicForestFireGenerator(0.38,
>>> True, 0.34) # supposed to generate a directed graph with forward-burning
>>> probability of 0.38, backward burning probability of 0.34
>>>
>>> forestGraph = forestGenerator.generate(4039)
>>> # generate a graph with 4039 nodes
>>> # got one with 4040, but that is not the problem
>>>
>>> graph = networkit.dynamic.graphFromStream(forestGraph)
>>> # either I did not see it or this step was not documented
>>>
>>> print graph.numberOfNodes(), graph.numberOfEdges()
>>> # 4040, 6813
>>> #############################
>>>
>>> Now, the problem I have with this is the following: From the paper of
>>> Leskovec, Kleinberg, and Faloustsos I know that the ForestFire-algorithm
>>> is supposed to yield heavy-tailed degree sequences, communities, and
>>> degree distributions following a power law. Looking into the
>>> properties.overview it ... kind of looks ok-ish (i.e., degree power law
>>> exponent = 1.49, degree distribution looked power-law-like). It did not
>>> match the description of the shrinking diameter (19, 20) and somehow it
>>> turned to something undirected (directed? 0).
>>>
>>>
>>> But, using the SNAP-API to generate a ForestFire-Graph with the same
>>> attributes (nNodes, pf, pb) yields something quite different:
>>>
>>> #############################
>>> import snap
>>> ffGraph = snap.GenForestFire(4039, 0.38, 0.34)
>>> print ffGraph.GetNodes(), ffGraph.GetEdges()
>>> # 4039, 83979
>>> #############################
>>>
>>> Since the ForestFire-algorithm got implemented by one of the authors of
>>> the paper (or it was under his supervision), I am going to assume that
>>> the approach taken by SNAP is valid.
>>>
>>> Then I started reading the implementation of both codes in C++ - I have to
>>> admit that I never wrote serious C++ code, but I am quite willing to
>>> learn it to implement my algorithms for NetworKit, I like the speed.
>>>
>>> Now, as far as I understood the NetworKit-implementation:
>>> - the auto-functions are somewhat anonymous functions with a trailing
>>> return and the stuff behind the = is the data that it is given in
>>> brackets but I still am not sure what the [&] is - I know that & is a
>>> reference, but I have never seen the [] before. Please explain :-) - the
>>> outer-most auto function does not have any return - since I am new to C++
>>> that is okay for me, apparently the return type will be evaluated to
>>> void? - the forward/backward neighbors give all not visited (in-)
>>> neighbors of a node u - selectEdges samples edges randomly by selecting
>>> neighbors as long as a random number generator draws a number smaller
>>> than p, resp. r*p, or until the possible validEdges set is empty
>>>
>>> This is what I got from reading only. I suppose that the last thing is
>>> also where something goes amiss, since it should be (1-p)^(-1). resp.
>>> (1-r*p)^(-1) (or p / (1-p) ...? unsure, I read it in two different
>>> versions, but I believe the first one is correct).
>>>
>>> Over the weekend I used my spare time to test this. Instead of supplying
>>> the function with the actual probabilities, I provided the (1-p)^(-1),
>>> and (1-r*p)^(-1),
>>>
>>> A first test gave me more reasonable results with respect to my
>>> expectations: same number of nodes, about 16000 edges.
>>>
>>> The second try to generate a ForestFire-network failed already due to a
>>> memory-error, which was confusing, but it might be I mistyped somewhere.
>>> I did not write a script, which was in hindsight stupid.
>>>
>>> A third try, since everything has to be verified, gave me than again a
>>> quite different result, namely my 4040 nodes and over 8 million edges
>>> (parameters: 1/(1-0.38), True, 1/(1-0.38*0.34)). Verification of the last
>>> result in a fourth try took over 20 minutes until I just shrugged and
>>> aborted.
>>>
>>> After figuring out that I just have to follow through on the paper, I
>>> think it might be quite helpful to write this in the documentation.
>>> Moreover, with the changed parameters it took quite long. For comparison,
>>> SNAP needed 0.04 seconds on average to generate a network with 4039 nodes
>>> while NetworKit took between 1 minute (4040 nodes, about 6000 edges) and
>>> 18 minutes (4040, 8 million) or I aborted the process since I did not
>>> want to wait so long. With around 16k edges it took about 4 minutes. The
>>> general problem that the properties.overview yielded an undirected
>>> network is still there.
>>>
>>> If I read the code correctly, then I assume that C++ is not that hard to
>>> learn and I will start implementing my stuff most likely with NetworKit
>>> as well.
>>>
>>> If anyone can comment on this, I would be very glad.
>>>
>>> Regards,
>>>
>>> Wolfgang Schlauch
>
>
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20141028/b00ccabe/attachment.sig>
More information about the NetworKit
mailing list