[Networkit] Algorithms based on shortest path aggregation are *embarrassingly* parallel

Christian Staudt christian.staudt at kit.edu
Thu Oct 8 20:15:31 CEST 2015


Hi NetworKit developers,
I have just parallelized ApproxBetweenness2 in the Dev branch. Tests are okay, but I am not yet confident that it is correct. Would be thankful for peer review.

https://algohub.iti.kit.edu/parco/NetworKit/NetworKit/changeset/243be4e0315c48cbd025a92992151db930c1ed45


Should it be parallel? Max has raised the point that the memory footprint grows with the number of threads:

>> There is a very practical reason as to why the BFS searches are not parallel: The dependency vector needs space linear in z as does the BFS (at least) and the stack. scoreData is also linear in z, so you end ab with at least z + 3z * t space where t is the number of threads

> I forgot that the BFS stores more information with the number of paths and the distances, so it's more like 5z plus BFS internal data structures.


I think it is important to keep in mind that the faster algorithm is not necessarily the better tool when parallelism is paid with memory. With our institute workstation at KIT we are spoiled by 256 GB RAM so we don’t really notice, users with less RAM may run into memory problems earlier. For a release, we need to decide whether maximum parallelism at the expense of memory should be the default or optional. Opinions?

Chris


On 09 Sep 2015, at 16:28, Arie Slobbe <aslobbe at macalester.edu> wrote:

> for centrality.ApproxCloseness, the largest gains from parallelization would come from replacing line 36 ("for (node s : sampledNodes") { ) with a parallel equivalent. But I'm no expert on parallel computing and I would sure appreciate some help with the implementation!
> 
> On Wed, Sep 9, 2015 at 7:54 AM, Christian Staudt <christian.staudt at kit.edu> wrote:
> p.s. There is one more way to cut the running time here: Max and I have been working on implementing a modified BFS algorithm which can be 2-4 times faster on complex networks. To be announced soon.
> 
> On 09 Sep 2015, at 14:46, Christian Staudt <christian.staudt at kit.edu> wrote:
> 
> > Hi developers,
> > centrality measures based on shortest paths (like betweenness, closeness etc.) are relevant, but also expensive to compute. Doing many traversals of a large graph takes long, but parallelizing them and aggregating the results is probably easy.
> >
> > The following algorithms are currently not parallel:
> >
> > centrality.ApproxBetweenness2
> > centrality.ApproxCloseness
> >
> >
> > So for those working on the subject, can you please parallelize these and future similar algorithms?
> >
> > Best,
> > Chris
> >
> >
> >
> > _______________________________________________
> > NetworKit mailing list
> > NetworKit at ira.uni-karlsruhe.de
> > https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
> 
> 
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
> 
> 
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20151008/ff1c20b3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20151008/ff1c20b3/attachment.sig>


More information about the NetworKit mailing list