[Networkit] StronglyConnectedComponents iterative algorithm integration

Maximilian Vogel maximilian.vogel at student.kit.edu
Fri Dec 2 11:08:42 CET 2016


Hi everybody,

I guess it's time to move forward in this issue.

On 26.07.2016 09:58, Elisabetta Bergamini wrote:
> About the function returning the i-th neighbor, I also think it would 
> be okay, if nobody sees any drawback.
Well, one can argue that it is a "weird" API of the Graph class making 
it even more exuberant than it already is.
Instead of storing the neighbors of a node explicitly or adding the 
function returning the i-th neighbor, I tried to find a solution to 
access the i-th neighbor. The workaround for Graph::getIthNeighbor(u, i) 
is to get the neighbors of u with Graph::neighbors(node u) which returns 
a vector of node ids and then access the i-th element of that vector. My 
idea to prevent the copying and returning of the vector is to return a 
const reference to the vector holding the node ids/neighbors. Then I 
took a look at the implementation of Graph::neighbors(node u). It uses 
the forNeighborsOf(node u)-iterator which essentially uses the 
forOutEdges-iterator to fill a vector with neighboring node ids. If this 
is work is carried out anyways, then a const reference won't save much 
at all. Also I believe that this implementation may lead to wrong 
results in the undirected case, but I need to check on that more 
thoroughly. Instead, two variations of the method should exist returning 
either a copy or const reference of the outEdges[u]-vector depending on 
the use case.
If we keep the Graph::getIthNeighbour(node u, index i), I see no reason 
to expose a template parameter specifying whether the graph is directed 
or not as the graph also keeps this information internally.
A change to Graph API seems necessary for either solution.

> What about dynamic graphs, where an edge can be deleted? At some point 
> it could happen that a node that used to be the i-th neighbor of 
> another node is not a neighbor anymore. Would that be a problem or do 
> you think this can be handled easily?
I don't think that this is any problem at all because the connected 
components algorithms do not have any update mechanism for dynamic 
graphs. If you want the (strongly) connected components after the graph 
has changed, the algorithm has to run from scratch again.

Once we decide on the preferred solution, I'll take care of the 
necessary changes and push them to the Dev branch.

Best,
Max


> On Jul 25, 2016, at 16:15, Maximilian Vogel 
> <maximilian.vogel at student.kit.edu 
> <mailto:maximilian.vogel at student.kit.edu>> wrote:
>
>> Hi everyone,
>>
>> thanks for the great effort. I can confirm your working example with 
>> the different instructions to increase the stack size. Although we 
>> may switch to the iterative DFS implementation, I still think that 
>> these instructions are informative and helpful. I've added a bullet 
>> point with a brief description to raise the stack size in the "Known 
>> issues" section of the Get Started/Readme document [to be pushed].
>>
>> The iterative implementation you provided works fine. I'm with 
>> Christian's argument that SCC should work out of the box which is why 
>> I think your implementation should be the new default. The slight 
>> performance decrease on some instances definitely does not weigh much 
>> here in my opinion.
>>
>> The implementation detail regarding the i-th neighbor has to be 
>> solved. In my opinion there is a valid need for such a function due 
>> to your implementation and I think it's okay to add this function 
>> permanently. Such decisions should not be made by one person alone 
>> which is why I'd like to hear more opinions on that.
>> A possible workaround would be to store each node's outgoing edges in 
>> separate queues increasing the algorithm's memory requirements by 
>> O(m). The running time shouldn't be affected too much.
>>
>> If we settle for your iterative implementation, the recursive one can 
>> be removed entirely as I see no reason to keep it.
>>
>> Best,
>> Max
>>
>>
>> On 15.07.2016 13:59, Obada Mahdi wrote:
>>> Hi everyone,
>>>
>>> following up on my brief comment during the GALA project 
>>> presentations yesterday, I would like to share my insight here for 
>>> reference.
>>>
>>> In short, most Linux-based distributions (including the one used on 
>>> our computeNN.iti.kit.edu <http://computenn.iti.kit.edu/> servers) 
>>> set per-process resource limits in a way that restricts stack size 
>>> to a maximum of 8 MB of memory by default [1], which is reached 
>>> quickly when running recursive algorithms like NetworKit's 
>>> StronglyConnectedComponents on instances like e.g. DIMACS road 
>>> graphs [2]. Exceeding this limit causes programs to crash with a 
>>> "segmentation fault".
>>>
>>> Since NetworKit is targeted at large-scale network analysis, I 
>>> assume that the default limit of 8 MB stack size on most Linux-based 
>>> machines is too restrictive in general and may require user 
>>> interaction to lift it. Given the rather unspecific error message 
>>> ("Segmentation fault", program crash), I suggest to indicate the use 
>>> of stack-based recursion in the NetworKit documentation (e.g. for 
>>> StronglyConnectedComponents::run()) and pointing out possible 
>>> workarounds for affected platforms.
>>>
>>> For completeness, here are two ways a user can adjust the stack size 
>>> limit:
>>>
>>> (1) From the command line ("bash" shell), the stack resource limit 
>>> can be lifted via
>>>
>>>   ulimit -s unlimited
>>>
>>> if no hard limit is set. If a hard limit exists, the maximum 
>>> possible value can be obtained by
>>>
>>>   ulimit -Hs
>>>
>>> It seems that Mac OS X may not be as generous and enforces a hard 
>>> limit of 16 MB [3].
>>>
>>> Note that resource limits are set on a per-process basis and are 
>>> inherited by child processes. Changing it this way will apply, for 
>>> example, to a Python shell started subsequently within the same 
>>> shell session, but changes are lost once the session is terminated.
>>>
>>> (2) It is also possible to change resource limits directly from 
>>> within Python:
>>>
>>>   import resource
>>>   resource.setrlimit(resource.RLIMIT_STACK, (-1, -1))
>>>
>>> will have the same effect as the first shell command above, i.e. 
>>> lifting the stack size limit.
>>>
>>>
>>> Background: I have experienced program crashes while trying to 
>>> confirm that some road graphs from the 9th DIMACS challenge [1] are 
>>> indeed strongly connected by transforming them into an 
>>> EdgeList-compatible format and using NetworKit's Python interface to 
>>> compute SCCs.
>>>
>>> MWE using a DIMACS graph (reproducible on e.g. compute3.iti.kit.edu 
>>> <http://compute3.iti.kit.edu/>):
>>>
>>> - shell command for fetching CAL instance and converting it to an 
>>> edge list:
>>>
>>> curl 
>>> http://www.dis.uniroma1.it/challenge9/data/USA-road-t/USA-road-t.CAL.gr.gz 
>>> | \
>>>   zcat | awk '/^a / { print $2 " " $3 " " $4 }' >USA-road-t.CAL.wel
>>>
>>> - interactive Python3 session:
>>>
>>>   >>> from networkit import *
>>>   >>> gr = readGraph("USA-road-t.CAL.wel", 
>>> fileformat=Format.EdgeList, separator=" ", \
>>>   ... firstNode=1, commentPrefix="c", continuous=True, directed=True)
>>>   >>> s = components.StronglyConnectedComponents(gr)
>>>   >>> s.run()
>>>   Segmentation fault (core dumped)
>>>
>>> With a web graph from the SNAP dataset repository:
>>>
>>> - fetching dataset:
>>>   curl http://snap.stanford.edu/data/web-Google.txt.gz | zcat 
>>> >web-Google.txt
>>>
>>> - interactive Python3 session:
>>>
>>>   >>> from networkit import *
>>>   >>> gr = readGraph("web-Google.txt", 
>>> fileformat=Format.EdgeListTabZero, \
>>>   ... commentPrefix="#", continuous=True, directed=True)
>>>   >>> s = components.StronglyConnectedComponents(gr)
>>>   >>> s.run()
>>>   Segmentation fault (core dumped)
>>>
>>> Both examples run fine with the stack limit removed as shown above.
>>>
>>>
>>> The attached patch (scc-iter.patch, against current Dev 
>>> [5750:e0903081deb1]) is an experimental attempt to work around 
>>> limited stack size by employing an iterative adaption of the 
>>> existing implementation in StronglyConnectedComponents::run(). It 
>>> provides the alternative method runIteratively(), which uses an 
>>> additional std::vector-based stack instead of the program call 
>>> stack. I am sharing it here for the curious and interested -- it has 
>>> not been tested thoroughly, and it currently relies on an additional 
>>> interface function hacked into the Graph class (namely 
>>> Graph::getIthNeighbor) for obtaining the i-th outgoing neighbor of a 
>>> node in constant time.
>>>
>>> It seems to work fine on the two instances mentioned above and is 
>>> slightly faster (about 10%) for the strongly connected road graph 
>>> (one component) than the recursive baseline and slightly slower 
>>> (again about 10%) for the web graph, for which roughly half of its 
>>> nodes are in one SCC, while most others build components of size 1. 
>>> Again, no extensive evaluation has been performed, but these 
>>> preliminary findings line up with my intuition that call-stack 
>>> recursion performs very well because it is based on a dedicated 
>>> hardware register/memory layout and does not require explicit 
>>> boundary checks or initialization, not to mention compiler 
>>> optimizations that may take advantage of (partial) function 
>>> inlining. The overhead of storing additional return addresses and 
>>> padding stack frames seems to show only for a larger recursion depth.
>>>
>>>
>>> Regards
>>>
>>> Obada
>>>
>>> [1] http://stackoverflow.com/questions/2656722/
>>> [2] http://www.dis.uniroma1.it/challenge9/download.shtml
>>> [3] http://stackoverflow.com/questions/13245019/
>>> [4] 
>>> https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
>>>
>>>
>>> _______________________________________________
>>> NetworKit mailing list
>>> NetworKit at ira.uni-karlsruhe.de
>>> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
>>
>> _______________________________________________
>> NetworKit mailing list
>> NetworKit at ira.uni-karlsruhe.de <mailto:NetworKit at ira.uni-karlsruhe.de>
>> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
>
>
>
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20161202/abe40086/attachment-0001.html>


More information about the NetworKit mailing list