[Networkit] Problem computing ApproxBetweeness

Christian Staudt christian.staudt at kit.edu
Fri May 16 13:47:17 CEST 2014


Hi,
I believe the problem is still unsolved, and we cannot reproduce it on our machines. I have two more suggestions:

- Could it be that it is a subtle parallel bug? What happens if you reduce the number of threads, possibly running it in single-threaded mode? This can be controlled with the NetworKit.setNumberOfThreads(1) function in Python or with the —threads=1 flag for the unit tests.
- Would it be acceptable to reduce the accuracy of the approximation so that you get a solution in time?

Christian

Am 07.05.2014 um 16:07 schrieb Lars Beckmann <lars.beckmann at gmail.com>:

> Hi Henning,
> 
> I did what you suggested (see output attached). I let it run for a couple of hours on my admittedly relatively weak machine. I have a VM running on my laptop (which has a 3 years old i7-620M).
> 
> To me the output looks ok and if I understand that correctly it chewed through about 20k out of the 60k samples when I aborted the process.
> 
> Does that give you any info that I don't see? Is there anything else we can do?
> 
> Lars
> 
> 
> 
> On Wed, May 7, 2014 at 11:03 AM, Henning Meyerhenke <meyerhenke at kit.edu> wrote:
> Hi Kevin,
> 
> I have created a unit test tryApproxBetweennessOnRealGraph which operates on your graph.
> 
> On my MacBook with 4 cores the operation took about 800 seconds, which is in the expected range and far from the behavior you describe. This timing is even with --loglevel=DEBUG, fewer output would result in a somewhat faster run.
> 
> Could you please update to the latest Dev branch revision and run the test yourself? You need to put your graph into the input/ directory for that. Please produce DEBUG output as well and have it written to a file if possible.
> 
> Thanks,
> Henning
> 
> 
> 
> 
> Am 07.05.14 10:46, schrieb Kevin Tierney:
> 
> Hi,
> 
> We used the defaults.
> 
> Kevin
> 
> On 07.05.2014 10:34, Henning Meyerhenke wrote:
> Hi Kevin,
> 
> What were the approximation and confidence parameters you used? Or did
> you go along with the default epsilon=0.01 and delta=0.1?
> 
> Best,
> Henning
> 
> 
> Am 07.05.14 09:34, schrieb Kevin Tierney:
> Hi,
> 
> We have come across another graph where there seems to be an issue with
> ApproxBetweenness. This graph is not trivially small, nor is it
> completely connected, meaning we'd actually be rather interested in
> finding out the ApproxBetweenness value for it. I let it run overnight
> (~14 hours with 8 cores) and did not get an answer, so I assume it was
> never going to finish.
> 
> The graph is a bit large, so I put it here:
> https://dl.dropboxusercontent.com/u/16348505/ns894786.mps.gz.variable.graph.bz2
> 
> 
> Any help you can provide would be greatly appreciated.
> 
> By the way, we tried to compute the number of samples ourselves and it
> seemed like it would only be around ~63000, so perhaps there is another
> issue?
> 
> Thanks for your help.
> 
> Kevin
> 
> On 06.05.2014 16:32, Christian Staudt wrote:
> The important thing is that you get sensible results on the real
> networks that are of interest to you. If that’s the case, I give a
> low priority to this bug. I agree that betweenness is uninteresting
> on near-complete graphs.
> 
> Am 06.05.2014 um 16:21 schrieb Kevin Tierney <kevin.tierney at upb.de>:
> 
> Hi Christian,
> 
> Thanks for looking into the issue and for your update.
> 
> We've noticed similar behavior on complete or nearly-complete graphs. I
> suppose computing a betweenness measure on such graphs is pointless
> anyway (at least for what we're doing), but at least that seems like a
> possible cause of the weird behavior.
> 
> Kevin
> 
> 
> On 06.05.2014 15:30, Christian Staudt wrote:
> Hi Kevin,
> running the program with the log level set to INFO reveals the
> problem:
> 
> [INFO ][“src/cpp/centrality/ApproxBetweenness.cpp”, 38: virtual
> void NetworKit::ApproxBetweenness::run()]: estimating vertex diameter
> [INFO ][“src/cpp/centrality/ApproxBetweenness.cpp”, 41: virtual
> void NetworKit::ApproxBetweenness::run()]: estimated diameter: 2
> [INFO ][“src/cpp/centrality/ApproxBetweenness.cpp”, 47: virtual
> void NetworKit::ApproxBetweenness::run()]: taking
> 9223372036854775808 path samples
> 
> A ridiculously high number of samples is taken. Where does it come
> from? The number of samples is calculated as:
> 
>      count r = ceil((c / (epsilon * epsilon)) * (floor(log(vd - 2))
> + 1 + log(1 / delta)));
> 
> So if the estimated diameter (vd) is 2, then the formula contains
> the logarithm of 0, which looks like trouble. I am currently not
> sure how to fix this, because this is how the formula was
> published. We will investigate.
> 
> At the same time I discovered and fixed an error in the formula,
> but I believe this was unrelated to the problem you describe.
> Please also update to the latest Dev-Revision for your experiments.
> 
> Kind regards
> Christian
> 
> 
> 
> Am 06.05.2014 um 14:57 schrieb Kevin Tierney
> <kevin.tierney at uni-paderborn.de>:
> 
> Dear NetworKit Gurus,
> 
> We are trying to use the approximate betweenness function and
> notice an
> odd thing happening on small graphs. I have attached a graph where
> calling:
> 
> NetworKit.centrality.ApproxBetweenness(G)
> 
> seems to never finish. At least the CPU time required seems to be way
> more than should be for such a small graph.
> 
> You might say "Why would you compute approxbetweenness for such a
> small
> graph?" -- the reason is this was just mixed in with a bunch of
> larger
> graphs where betweenness would take too long.
> 
> Do you have any ideas on what is happening?
> 
> By the way, could this perhaps be related to the sample size of 42 in
> the .cpp file for approxbetweenness? We aren't getting any issues on
> large graphs; in fact they are being solved relatively quickly.
> 
> Thanks for your help!
> 
> Sincerely,
> 
> Kevin
> 
> 
> --
> Jun.-Prof Dr. Kevin Tierney
> University of Paderborn
> Faculty of Business Administration and Economics
> Department 3: Business Information Systems
> Room: Q2.472
> Tel.: +49 (0)5251 60-5244
> <markshare_5_0.mps.gz.constraint.graph><ATT00001.c>
> 
> 
> --
> Jun.-Prof Dr. Kevin Tierney
> University of Paderborn
> Faculty of Business Administration and Economics
> Department 3: Business Information Systems
> Room: Q2.472
> Tel.: +49 (0)5251 60-5244
> 
> 
> 
> 
> 
> -- 
> 
> =======================================================
> Karlsruhe Institute of Technology (KIT)
> Institute of Theoretical Informatics (ITI)
> 
> Juniorprof. Dr. Henning Meyerhenke
> Theoret. Informatics / Parallel Computing
> 
> Phone: +49-721-608-41876
> Web: http://parco.iti.kit.edu/henningm/
> 
> KIT - University of the State of Baden-Wuerttemberg and
> National Research Center of the Helmholtz Association
> =======================================================
> 
> 
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
> 
> 
> 
> 
> -- 
> Lars Beckmann
> 
> Uhlenstraße 10
> 33098 Paderborn
> Germany
> 
> Cell (D2): +49 (0) 178 59 69 150
> Phone: +49 (0) 89 380 123 90
> Fax: +49 (0) 180 102 113 4628
> <outputRealGraphTest.zip>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20140516/a0f8dbbc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20140516/a0f8dbbc/attachment.sig>


More information about the NetworKit mailing list