[Networkit] Linux resource limits and call-stack-based recursion in NetworKit algorithms

Elisabetta Bergamini elisabetta.bergamini at kit.edu
Tue Jul 26 09:58:14 CEST 2016


Hello everybody,

Obada and Max, thanks a lot for your efforts and helpful insights!

I totally agree that the iterative version of SCC should replace the old recursive one. About the function returning the i-th neighbor, I also think it would be okay, if nobody sees any drawback. What about dynamic graphs, where an edge can be deleted? At some point it could happen that a node that used to be the i-th neighbor of another node is not a neighbor anymore. Would that be a problem or do you think this can be handled easily?

Best,

Elisabetta

==========================================================
Karlsruhe Institute of Technology (KIT)
Institute of Theoretical Informatics (ITI)

Elisabetta Bergamini
Theoret. Informatics / Parallel Computing

Phone: +49-721-608-41821
Web: http://parco.iti.kit.edu/bergamini

KIT - The Research University in the Helmholtz Association
==========================================================

On Jul 25, 2016, at 16:15, Maximilian Vogel <maximilian.vogel at student.kit.edu> wrote:

> Hi everyone,
> 
> thanks for the great effort. I can confirm your working example with the different instructions to increase the stack size. Although we may switch to the iterative DFS implementation, I still think that these instructions are informative and helpful. I've added a bullet point with a brief description to raise the stack size in the "Known issues" section of the Get Started/Readme document [to be pushed].
> The iterative implementation you provided works fine. I'm with Christian's argument that SCC should work out of the box which is why I think your implementation should be the new default. The slight performance decrease on some instances definitely does not weigh much here in my opinion.
> 
> The implementation detail regarding the i-th neighbor has to be solved. In my opinion there is a valid need for such a function due to your implementation and I think it's okay to add this function permanently. Such decisions should not be made by one person alone which is why I'd like to hear more opinions on that. 
> A possible workaround would be to store each node's outgoing edges in separate queues increasing the algorithm's memory requirements by O(m). The running time shouldn't be affected too much.
> 
> If we settle for your iterative implementation, the recursive one can be removed entirely as I see no reason to keep it.
> 
> Best,
> Max
> 
> 
> On 15.07.2016 13:59, Obada Mahdi wrote:
>> Hi everyone,
>> 
>> following up on my brief comment during the GALA project presentations yesterday, I would like to share my insight here for reference.
>> 
>> In short, most Linux-based distributions (including the one used on our computeNN.iti.kit.edu servers) set per-process resource limits in a way that restricts stack size to a maximum of 8 MB of memory by default [1], which is reached quickly when running recursive algorithms like NetworKit's StronglyConnectedComponents on instances like e.g. DIMACS road graphs [2]. Exceeding this limit causes programs to crash with a "segmentation fault".
>> 
>> Since NetworKit is targeted at large-scale network analysis, I assume that the default limit of 8 MB stack size on most Linux-based machines is too restrictive in general and may require user interaction to lift it. Given the rather unspecific error message ("Segmentation fault", program crash), I suggest to indicate the use of stack-based recursion in the NetworKit documentation (e.g. for StronglyConnectedComponents::run()) and pointing out possible workarounds for affected platforms.
>> 
>> For completeness, here are two ways a user can adjust the stack size limit:
>> 
>> (1) From the command line ("bash" shell), the stack resource limit can be lifted via
>> 
>>   ulimit -s unlimited
>> 
>> if no hard limit is set. If a hard limit exists, the maximum possible value can be obtained by
>> 
>>   ulimit -Hs
>> 
>> It seems that Mac OS X may not be as generous and enforces a hard limit of 16 MB [3].
>> 
>> Note that resource limits are set on a per-process basis and are inherited by child processes. Changing it this way will apply, for example, to a Python shell started subsequently within the same shell session, but changes are lost once the session is terminated.
>> 
>> (2) It is also possible to change resource limits directly from within Python:
>> 
>>   import resource
>>   resource.setrlimit(resource.RLIMIT_STACK, (-1, -1))
>> 
>> will have the same effect as the first shell command above, i.e. lifting the stack size limit.
>> 
>> 
>> Background: I have experienced program crashes while trying to confirm that some road graphs from the 9th DIMACS challenge [1] are indeed strongly connected by transforming them into an EdgeList-compatible format and using NetworKit's Python interface to compute SCCs.
>> 
>> MWE using a DIMACS graph (reproducible on e.g. compute3.iti.kit.edu):
>> 
>> - shell command for fetching CAL instance and converting it to an edge list:
>> 
>> curl http://www.dis.uniroma1.it/challenge9/data/USA-road-t/USA-road-t.CAL.gr.gz | \
>>   zcat | awk '/^a / { print $2 " " $3 " " $4 }' >USA-road-t.CAL.wel
>> 
>> - interactive Python3 session:
>> 
>>   >>> from networkit import *
>>   >>> gr = readGraph("USA-road-t.CAL.wel", fileformat=Format.EdgeList, separator=" ", \
>>   ... firstNode=1, commentPrefix="c", continuous=True, directed=True)
>>   >>> s = components.StronglyConnectedComponents(gr)
>>   >>> s.run()
>>   Segmentation fault (core dumped)
>> 
>> With a web graph from the SNAP dataset repository:
>> 
>> - fetching dataset:
>>   curl http://snap.stanford.edu/data/web-Google.txt.gz | zcat >web-Google.txt
>> 
>> - interactive Python3 session:
>> 
>>   >>> from networkit import *
>>   >>> gr = readGraph("web-Google.txt", fileformat=Format.EdgeListTabZero, \
>>   ... commentPrefix="#", continuous=True, directed=True)
>>   >>> s = components.StronglyConnectedComponents(gr)
>>   >>> s.run()
>>   Segmentation fault (core dumped)
>> 
>> Both examples run fine with the stack limit removed as shown above.
>> 
>> 
>> The attached patch (scc-iter.patch, against current Dev [5750:e0903081deb1]) is an experimental attempt to work around limited stack size by employing an iterative adaption of the existing implementation in StronglyConnectedComponents::run(). It provides the alternative method runIteratively(), which uses an additional std::vector-based stack instead of the program call stack. I am sharing it here for the curious and interested -- it has not been tested thoroughly, and it currently relies on an additional interface function hacked into the Graph class (namely Graph::getIthNeighbor) for obtaining the i-th outgoing neighbor of a node in constant time.
>> 
>> It seems to work fine on the two instances mentioned above and is slightly faster (about 10%) for the strongly connected road graph (one component) than the recursive baseline and slightly slower (again about 10%) for the web graph, for which roughly half of its nodes are in one SCC, while most others build components of size 1.  Again, no extensive evaluation has been performed, but these preliminary findings line up with my intuition that call-stack recursion performs very well because it is based on a dedicated hardware register/memory layout and does not require explicit boundary checks or initialization, not to mention compiler optimizations that may take advantage of (partial) function inlining. The overhead of storing additional return addresses and padding stack frames seems to show only for a larger recursion depth.
>> 
>> 
>> Regards
>> 
>> Obada
>> 
>> [1] http://stackoverflow.com/questions/2656722/
>> [2] http://www.dis.uniroma1.it/challenge9/download.shtml
>> [3] http://stackoverflow.com/questions/13245019/
>> [4] https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
>> 
>> 
>> _______________________________________________
>> NetworKit mailing list
>> NetworKit at ira.uni-karlsruhe.de
>> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
> 
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20160726/2cbaf710/attachment.html>


More information about the NetworKit mailing list