[Networkit] Problem computing ApproxBetweeness

Lars Beckmann lars.beckmann at gmail.com
Wed May 7 16:07:35 CEST 2014


Hi Henning,

I did what you suggested (see output attached). I let it run for a couple
of hours on my admittedly relatively weak machine. I have a VM running on
my laptop (which has a 3 years old i7-620M).

To me the output looks ok and if I understand that correctly it chewed
through about 20k out of the 60k samples when I aborted the process.

Does that give you any info that I don't see? Is there anything else we can
do?

Lars



On Wed, May 7, 2014 at 11:03 AM, Henning Meyerhenke <meyerhenke at kit.edu>wrote:

> Hi Kevin,
>
> I have created a unit test tryApproxBetweennessOnRealGraph which operates
> on your graph.
>
> On my MacBook with 4 cores the operation took about 800 seconds, which is
> in the expected range and far from the behavior you describe. This timing
> is even with --loglevel=DEBUG, fewer output would result in a somewhat
> faster run.
>
> Could you please update to the latest Dev branch revision and run the test
> yourself? You need to put your graph into the input/ directory for that.
> Please produce DEBUG output as well and have it written to a file if
> possible.
>
> Thanks,
> Henning
>
>
>
>
> Am 07.05.14 10:46, schrieb Kevin Tierney:
>
>  Hi,
>>
>> We used the defaults.
>>
>> Kevin
>>
>> On 07.05.2014 10:34, Henning Meyerhenke wrote:
>>
>>> Hi Kevin,
>>>
>>> What were the approximation and confidence parameters you used? Or did
>>> you go along with the default epsilon=0.01 and delta=0.1?
>>>
>>> Best,
>>> Henning
>>>
>>>
>>> Am 07.05.14 09:34, schrieb Kevin Tierney:
>>>
>>>> Hi,
>>>>
>>>> We have come across another graph where there seems to be an issue with
>>>> ApproxBetweenness. This graph is not trivially small, nor is it
>>>> completely connected, meaning we'd actually be rather interested in
>>>> finding out the ApproxBetweenness value for it. I let it run overnight
>>>> (~14 hours with 8 cores) and did not get an answer, so I assume it was
>>>> never going to finish.
>>>>
>>>> The graph is a bit large, so I put it here:
>>>> https://dl.dropboxusercontent.com/u/16348505/ns894786.mps.
>>>> gz.variable.graph.bz2
>>>>
>>>>
>>>> Any help you can provide would be greatly appreciated.
>>>>
>>>> By the way, we tried to compute the number of samples ourselves and it
>>>> seemed like it would only be around ~63000, so perhaps there is another
>>>> issue?
>>>>
>>>> Thanks for your help.
>>>>
>>>> Kevin
>>>>
>>>> On 06.05.2014 16:32, Christian Staudt wrote:
>>>>
>>>>> The important thing is that you get sensible results on the real
>>>>> networks that are of interest to you. If that’s the case, I give a
>>>>> low priority to this bug. I agree that betweenness is uninteresting
>>>>> on near-complete graphs.
>>>>>
>>>>> Am 06.05.2014 um 16:21 schrieb Kevin Tierney <kevin.tierney at upb.de>:
>>>>>
>>>>>  Hi Christian,
>>>>>>
>>>>>> Thanks for looking into the issue and for your update.
>>>>>>
>>>>>> We've noticed similar behavior on complete or nearly-complete graphs.
>>>>>> I
>>>>>> suppose computing a betweenness measure on such graphs is pointless
>>>>>> anyway (at least for what we're doing), but at least that seems like a
>>>>>> possible cause of the weird behavior.
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>>>
>>>>>> On 06.05.2014 15:30, Christian Staudt wrote:
>>>>>>
>>>>>>> Hi Kevin,
>>>>>>> running the program with the log level set to INFO reveals the
>>>>>>> problem:
>>>>>>>
>>>>>>> [INFO ][“src/cpp/centrality/ApproxBetweenness.cpp”, 38: virtual
>>>>>>> void NetworKit::ApproxBetweenness::run()]: estimating vertex
>>>>>>> diameter
>>>>>>> [INFO ][“src/cpp/centrality/ApproxBetweenness.cpp”, 41: virtual
>>>>>>> void NetworKit::ApproxBetweenness::run()]: estimated diameter: 2
>>>>>>> [INFO ][“src/cpp/centrality/ApproxBetweenness.cpp”, 47: virtual
>>>>>>> void NetworKit::ApproxBetweenness::run()]: taking
>>>>>>> 9223372036854775808 path samples
>>>>>>>
>>>>>>> A ridiculously high number of samples is taken. Where does it come
>>>>>>> from? The number of samples is calculated as:
>>>>>>>
>>>>>>>      count r = ceil((c / (epsilon * epsilon)) * (floor(log(vd - 2))
>>>>>>> + 1 + log(1 / delta)));
>>>>>>>
>>>>>>> So if the estimated diameter (vd) is 2, then the formula contains
>>>>>>> the logarithm of 0, which looks like trouble. I am currently not
>>>>>>> sure how to fix this, because this is how the formula was
>>>>>>> published. We will investigate.
>>>>>>>
>>>>>>> At the same time I discovered and fixed an error in the formula,
>>>>>>> but I believe this was unrelated to the problem you describe.
>>>>>>> Please also update to the latest Dev-Revision for your experiments.
>>>>>>>
>>>>>>> Kind regards
>>>>>>> Christian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 06.05.2014 um 14:57 schrieb Kevin Tierney
>>>>>>> <kevin.tierney at uni-paderborn.de>:
>>>>>>>
>>>>>>>  Dear NetworKit Gurus,
>>>>>>>>
>>>>>>>> We are trying to use the approximate betweenness function and
>>>>>>>> notice an
>>>>>>>> odd thing happening on small graphs. I have attached a graph where
>>>>>>>> calling:
>>>>>>>>
>>>>>>>> NetworKit.centrality.ApproxBetweenness(G)
>>>>>>>>
>>>>>>>> seems to never finish. At least the CPU time required seems to be
>>>>>>>> way
>>>>>>>> more than should be for such a small graph.
>>>>>>>>
>>>>>>>> You might say "Why would you compute approxbetweenness for such a
>>>>>>>> small
>>>>>>>> graph?" -- the reason is this was just mixed in with a bunch of
>>>>>>>> larger
>>>>>>>> graphs where betweenness would take too long.
>>>>>>>>
>>>>>>>> Do you have any ideas on what is happening?
>>>>>>>>
>>>>>>>> By the way, could this perhaps be related to the sample size of 42
>>>>>>>> in
>>>>>>>> the .cpp file for approxbetweenness? We aren't getting any issues on
>>>>>>>> large graphs; in fact they are being solved relatively quickly.
>>>>>>>>
>>>>>>>> Thanks for your help!
>>>>>>>>
>>>>>>>> Sincerely,
>>>>>>>>
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jun.-Prof Dr. Kevin Tierney
>>>>>>>> University of Paderborn
>>>>>>>> Faculty of Business Administration and Economics
>>>>>>>> Department 3: Business Information Systems
>>>>>>>> Room: Q2.472
>>>>>>>> Tel.: +49 (0)5251 60-5244
>>>>>>>> <markshare_5_0.mps.gz.constraint.graph><ATT00001.c>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>>> Jun.-Prof Dr. Kevin Tierney
>>>>>> University of Paderborn
>>>>>> Faculty of Business Administration and Economics
>>>>>> Department 3: Business Information Systems
>>>>>> Room: Q2.472
>>>>>> Tel.: +49 (0)5251 60-5244
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> --
>
> =======================================================
> Karlsruhe Institute of Technology (KIT)
> Institute of Theoretical Informatics (ITI)
>
> Juniorprof. Dr. Henning Meyerhenke
> Theoret. Informatics / Parallel Computing
>
> Phone: +49-721-608-41876
> Web: http://parco.iti.kit.edu/henningm/
>
> KIT - University of the State of Baden-Wuerttemberg and
> National Research Center of the Helmholtz Association
> =======================================================
>
>
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
>
>


-- 
Lars Beckmann

Uhlenstraße 10
33098 Paderborn
Germany

Cell (D2): +49 (0) 178 59 69 150
Phone: +49 (0) 89 380 123 90
Fax: +49 (0) 180 102 113 4628
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20140507/d4f7a32f/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: outputRealGraphTest.zip
Type: application/zip
Size: 356110 bytes
Desc: not available
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20140507/d4f7a32f/attachment-0001.zip>


More information about the NetworKit mailing list