[Networkit] BFS: lambda function, timestamps

Marcel Radermacher marcel.radermacher at student.kit.edu
Tue Jun 10 16:24:48 CEST 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Christia,


On 10.06.2014 14:12, Christian Staudt wrote:
> Hi Marcel,
> 
> 
>> First of all, I would like to add a second run function to the
>> BFS class with the following header: template<typename N,
>> typename E, typename T> void BFS::run(Graph &g , node source , N
>> &&handle_node , E &&handle_edge , T &&traverse_edge);
> 
> In general, I like the idea of passing node and edge handlers to
> the BFS. The traverse_edge function is meant for your special case
> of merging clusters by BFS, but I can imagine other uses, so it’s
> general enough. Should be optional, maybe.
> 
> What I don’t like: - Having a Graph and source node as parameter of
> run is inconsistent with the current usage of the BFS class and the
> SSSP base class. See for example ApproxBetweenness.cpp lines 66-74.
> Any reason why we should stop using the class like this? How about
> passing all the parameters by constructor?

Actually, my proposal would only make sense if the graph would be
passed to the constructor. But I do not see the point of passing the
source node. As I see it in typical SSSP scenario there are no changes
to the graph but there are several queries on this specific graph.
ApproxBetweenes or the diameter of a graph are good examples for this.
With the current implementation a reset is equal to new allocation of
an object. If the search space is much smaller then the size of the
graph the timestamp solution would be faster. Otherwise I think,
std::fill can be used to achieve a faster reset, since no new memory
has to be allocated.

Is there a specific reason for your design decision?

> Open questions: - Have you thought about Cython compatibility?

Not really, since I have no experience with Cython. But I figured, as
the concept of passing the handlers has already been used in the graph
datastructure it would not be so much of a problem.

> 
>> Further, in each BFS run a new distance array is allocated.
>> Therefore, the BFS would still use linear time in the the size of
>> the graph. A simple way to deal with this is to use timestamps to
>> decide whether a node was visited before or not. A complete scan
>> of the array is only all INT_MAX times necessary.
> 
> I am not sure that I get the idea, can you give more details?

The basic idea is the following. The BFS, Dijkstra and several other
algorithms need to know if they have already visited a node before.
The forward implementation uses boolean flags or INFINIT values to
achieve this. If the search visit most of the nodes of the graph,
std::fill is probably the best/fastest solution.

Is the search space small one can save a timestamp per node. A node
was visited iff the timestamp matches the current timestamp. The reset
function simply increases the current timestamp.

>> This can be implemented within the BFS class or more general in a
>> new class within the auxiliary directory(?) to enable other
>> algorithms to profit from this datastructure (e.g. Dijkstra). If
>> requested, I could post a more thorough specification of the
>> interface.
> 
> Please do.

The changes to NetworKit are more complex then I thought at first.
Something like the following class hierarchy would be necessary.

- ------------------
class VisitFlags {
public:
	VisitFlags(size_t n);
	virtual bool isSet(index v) = 0;
	virtual void set(index v) = 0;
	virtual void unset(index v) = 0;
	virtual void unsetAll(index v) = 0;
};

class BoolFlags : public VisitFlags;
class TimestampFlags: public VisitFlags;

template<typename Flags = BoolFlags> class SSSP {
private:
	Flags flags;
	...
public:
	SSSP(const Graph &g); g(g), flags(g.upperNodeIdBound() + 1) { }
	void reset() { flags.unsetAll(); };
	
};
- ------------------

With this the developer could decide which flags to use to get the
best performance for his algorithm. Nevertheless, if there is no
specific reason to pass the source node to the constructor, I would go
for a solution which uses std::fill to reset all distance labels.

Kind regards
Marcel

On 10.06.2014 14:12, Christian Staudt wrote:

> Hi Marcel,
> 
> 
>> First of all, I would like to add a second run function to the
>> BFS class with the following header: template<typename N,
>> typename E, typename T> void BFS::run(Graph &g , node source , N
>> &&handle_node , E &&handle_edge , T &&traverse_edge);
> 
> In general, I like the idea of passing node and edge handlers to
> the BFS. The traverse_edge function is meant for your special case
> of merging clusters by BFS, but I can imagine other uses, so it’s
> general enough. Should be optional, maybe.
> 
> What I don’t like: - Having a Graph and source node as parameter of
> run is inconsistent with the current usage of the BFS class and the
> SSSP base class. See for example ApproxBetweenness.cpp lines 66-74.
> Any reason why we should stop using the class like this? How about
> passing all the parameters by constructor?
> 
> Open questions: - Have you thought about Cython compatibility?
> 
> 
>> Further, in each BFS run a new distance array is allocated.
>> Therefore, the BFS would still use linear time in the the size of
>> the graph. A simple way to deal with this is to use timestamps to
>> decide whether a node was visited before or not. A complete scan
>> of the array is only all INT_MAX times necessary.
> 
> I am not sure that I get the idea, can you give more details?
> 
>> This can be implemented within the BFS class or more general in a
>> new class within the auxiliary directory(?) to enable other
>> algorithms to profit from this datastructure (e.g. Dijkstra). If
>> requested, I could post a more thorough specification of the
>> interface.
> 
> Please do.
> 
>> As a last remark, why are there two BFS implementations? One
>> within the graph datastructure (only used by the
>> RegionGrowingOverlapper class) and one as a standalone class? I
>> think the breadthFirst* functions adds unnecessary complexity to
>> the graph datastructure.
> 
> The BFS iterator in Graph was first, then there was the need for
> having BFS and Dijkstra as subclasses of a common SSSP algorithm
> class. It seems to me that if your proposal works, then the BFS
> iterator is indeed a bit redundant and we should remove it maybe.
> 
> Best regards, Christian
> 
> 
> 
> Am 09.06.2014 um 14:27 schrieb Marcel Radermacher
> <marcel.radermacher at student.kit.edu>:
> 
>> Dear NetworKit-Developers,
>> 
>> I would like to suggest two rather simple enhancements to the
>> BFS implementation. I am currently working an image segmentation
>> algorithm which relies on an efficient merging of two clusters.
>> The partition datastructure implemented in NetworKit is not
>> sufficient as it needs linear time in the size of the graph to
>> merge two clusters. To avoid redundancy it is not a favoured
>> option to implement a secondary partition datastructure based on
>> the union-find datastructure (does not support removing elements
>> of a cluster).
>> 
>> A heuristic that can be used is to run a breadth-first search
>> (BFS) starting from a cluster vertex and set the cluster id
>> accordingly. To achieve linear time in the cluster size I would
>> like to propose the following changes to the current BFS
>> implementation.
>> 
>> First of all, I would like to add a second run function to the
>> BFS class with the following header: template<typename N,
>> typename E, typename T> void BFS::run(Graph &g , node source , N
>> &&handle_node , E &&handle_edge , T &&traverse_edge);
>> 
>> - The parameter handle_node is a void function which takes the
>> current node (I.e. the current head node of the queue) as
>> parameter. - The void function handle_edge is called with the
>> parameter (current, neighbor) iff neighbor is pushed into the
>> queue. - traverse_edge takes (current, neighbor) and returns
>> bool. The node neighbor is only pushed into the queue iff this
>> function returns true and the node neighbor has not already been
>> pushed.
>> 
>> With this I would be able to limit the search space of the BFS to
>> one cluster or more general to nodes with a specific property.
>> The current run function can be easily reimplemented using this
>> interface.
>> 
>> Further, in each BFS run a new distance array is allocated.
>> Therefore, the BFS would still use linear time in the the size of
>> the graph. A simple way to deal with this is to use timestamps to
>> decide whether a node was visited before or not. A complete scan
>> of the array is only all INT_MAX times necessary. This can be
>> implemented within the BFS class or more general in a new class
>> within the auxiliary directory(?) to enable other algorithms to
>> profit from this datastructure (e.g. Dijkstra). If requested, I
>> could post a more thorough specification of the interface.
>> 
>> I thing other algorithms can profit from this changes as well.
>> Either in runtime or to improve readability. For example, the BFS
>> implemented within centrality/Betweens.cpp could be eliminated.
>> 
>> As a last remark, why are there two BFS implementations? One
>> within the graph datastructure (only used by the
>> RegionGrowingOverlapper class) and one as a standalone class? I
>> think the breadthFirst* functions adds unnecessary complexity to
>> the graph datastructure.
>> 
>> I am looking forward to hear about your ideas.
>> 
>> Kind regards Marcel
>> 
>> 
>> _______________________________________________ NetworKit mailing
>> list NetworKit at ira.uni-karlsruhe.de 
>> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEbBAEBAgAGBQJTlxUwAAoJECwxr6rR6o3wTGEH+NJj/47uUDBBg1nFUCpZN8CX
gcziM0CwEjEGTPtssrnbE5eUKrWzBWB3OrRgA32HvrVMC7cG6m1WAsJqIUjwmEfi
Fy9K42gcLNBWpTDp7xpi9/vhUNKCWcY7TDJU0aujAzL4Itbh6oKEzp58pNXczOWQ
e6NsGBpHqb1xBYRpNE/MjxtylRPelbOneuFGqxF43y/76buoz7ps8juLPj6sQrQN
4ladZ861DuqK2jhfxJe7XE5JGXaiVBd/5zyvimbvTkHSR84QOax6wjwJL1Q+9/wK
pnQsD+AvTG1/LRRJK+NcIFmDrKiF2hIykR2LgsB7WUpRqElduIgNWIiGeGaS+A==
=ldp4
-----END PGP SIGNATURE-----



More information about the NetworKit mailing list