[Networkit] BFS: lambda function, timestamps

Christian Staudt christian.staudt at kit.edu
Tue Jun 10 14:12:55 CEST 2014


Hi Marcel,


> First of all, I would like to add a second run function to the BFS
> class with the following header:
> 	template<typename N, typename E, typename T>
> 	void BFS::run(Graph &g
> 		, node source
> 		, N &&handle_node
> 		, E &&handle_edge
> 		, T &&traverse_edge);

In general, I like the idea of passing node and edge handlers to the BFS. The traverse_edge function is meant for your special
case of merging clusters by BFS, but I can imagine other uses, so it’s general enough. Should be optional, maybe.

What I don’t like:
- Having a Graph and source node as parameter of run is inconsistent with the current usage of the BFS class and the SSSP base class. See for example ApproxBetweenness.cpp lines 66-74. Any reason why we should stop using the class like this? How about passing all the parameters by constructor?

Open questions:
- Have you thought about Cython compatibility?

	
> Further, in each BFS run a new distance array is allocated. Therefore,
> the BFS would still use linear time in the the size of the graph. A
> simple way to deal with this is to use timestamps to decide whether a
> node was visited before or not. A complete scan of the array is only
> all INT_MAX times necessary.

I am not sure that I get the idea, can you give more details?

> This can be implemented within the BFS
> class or more general in a new class within the auxiliary directory(?)
> to enable other algorithms to profit from this datastructure (e.g.
> Dijkstra). If requested, I could post a more thorough specification of
> the interface.

Please do.

> As a last remark, why are there two BFS implementations? One within
> the graph datastructure (only used by the RegionGrowingOverlapper
> class) and one as a standalone class? I think the breadthFirst*
> functions adds unnecessary complexity to the graph datastructure.

The BFS iterator in Graph was first, then there was the need for having BFS and Dijkstra as subclasses of a common SSSP algorithm class. It seems to me that if your proposal works, then the BFS iterator is indeed a bit redundant and we should remove it maybe.

Best regards,
Christian



Am 09.06.2014 um 14:27 schrieb Marcel Radermacher <marcel.radermacher at student.kit.edu>:

> Dear NetworKit-Developers,
> 
> I would like to suggest two rather simple enhancements to the BFS
> implementation. I am currently working an image segmentation algorithm
> which relies on an efficient merging of two clusters. The partition
> datastructure implemented in NetworKit is not sufficient as it needs
> linear time in the size of the graph to merge two clusters. To avoid
> redundancy it is not a favoured option to implement a secondary
> partition datastructure based on the union-find datastructure (does
> not support removing elements of a cluster).
> 
> A heuristic that can be used is to run a breadth-first search (BFS)
> starting from a cluster vertex and set the cluster id accordingly. To
> achieve linear time in the cluster size I would like to propose the
> following changes to the current BFS implementation.
> 
> First of all, I would like to add a second run function to the BFS
> class with the following header:
> 	template<typename N, typename E, typename T>
> 	void BFS::run(Graph &g
> 		, node source
> 		, N &&handle_node
> 		, E &&handle_edge
> 		, T &&traverse_edge);
> 
> - The parameter handle_node is a void function which takes the current
> node (I.e. the current head node of the queue) as parameter.
> - The void function handle_edge is called with the parameter (current,
> neighbor) iff neighbor is pushed into the queue.
> - traverse_edge takes (current, neighbor) and returns bool. The node
> neighbor is only pushed into the queue iff this function returns true
> and the node neighbor has not already been pushed.
> 
> With this I would be able to limit the search space of the BFS to one
> cluster or more general to nodes with a specific property. The current
> run function can be easily reimplemented using this interface.
> 
> Further, in each BFS run a new distance array is allocated. Therefore,
> the BFS would still use linear time in the the size of the graph. A
> simple way to deal with this is to use timestamps to decide whether a
> node was visited before or not. A complete scan of the array is only
> all INT_MAX times necessary. This can be implemented within the BFS
> class or more general in a new class within the auxiliary directory(?)
> to enable other algorithms to profit from this datastructure (e.g.
> Dijkstra). If requested, I could post a more thorough specification of
> the interface.
> 
> I thing other algorithms can profit from this changes as well. Either
> in runtime or to improve readability. For example, the BFS implemented
> within centrality/Betweens.cpp could be eliminated.
> 
> As a last remark, why are there two BFS implementations? One within
> the graph datastructure (only used by the RegionGrowingOverlapper
> class) and one as a standalone class? I think the breadthFirst*
> functions adds unnecessary complexity to the graph datastructure.
> 
> I am looking forward to hear about your ideas.
> 
> Kind regards
> Marcel
> 
> 
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20140610/8dbd0881/attachment.sig>


More information about the NetworKit mailing list