[Networkit] Linux resource limits and call-stack-based recursion in NetworKit algorithms

Maximilian Vogel maximilian.vogel at student.kit.edu
Mon Jul 25 16:15:36 CEST 2016

Hi everyone,

thanks for the great effort. I can confirm your working example with the 
different instructions to increase the stack size. Although we may 
switch to the iterative DFS implementation, I still think that these 
instructions are informative and helpful. I've added a bullet point with 
a brief description to raise the stack size in the "Known issues" 
section of the Get Started/Readme document [to be pushed].

The iterative implementation you provided works fine. I'm with 
Christian's argument that SCC should work out of the box which is why I 
think your implementation should be the new default. The slight 
performance decrease on some instances definitely does not weigh much 
here in my opinion.

The implementation detail regarding the i-th neighbor has to be solved. 
In my opinion there is a valid need for such a function due to your 
implementation and I think it's okay to add this function permanently. 
Such decisions should not be made by one person alone which is why I'd 
like to hear more opinions on that.
A possible workaround would be to store each node's outgoing edges in 
separate queues increasing the algorithm's memory requirements by O(m). 
The running time shouldn't be affected too much.

If we settle for your iterative implementation, the recursive one can be 
removed entirely as I see no reason to keep it.


On 15.07.2016 13:59, Obada Mahdi wrote:
> Hi everyone,
> following up on my brief comment during the GALA project presentations 
> yesterday, I would like to share my insight here for reference.
> In short, most Linux-based distributions (including the one used on 
> our computeNN.iti.kit.edu <http://computeNN.iti.kit.edu> servers) set 
> per-process resource limits in a way that restricts stack size to a 
> maximum of 8 MB of memory by default [1], which is reached quickly 
> when running recursive algorithms like NetworKit's 
> StronglyConnectedComponents on instances like e.g. DIMACS road graphs 
> [2]. Exceeding this limit causes programs to crash with a 
> "segmentation fault".
> Since NetworKit is targeted at large-scale network analysis, I assume 
> that the default limit of 8 MB stack size on most Linux-based machines 
> is too restrictive in general and may require user interaction to lift 
> it. Given the rather unspecific error message ("Segmentation fault", 
> program crash), I suggest to indicate the use of stack-based recursion 
> in the NetworKit documentation (e.g. for 
> StronglyConnectedComponents::run()) and pointing out possible 
> workarounds for affected platforms.
> For completeness, here are two ways a user can adjust the stack size 
> limit:
> (1) From the command line ("bash" shell), the stack resource limit can 
> be lifted via
>   ulimit -s unlimited
> if no hard limit is set. If a hard limit exists, the maximum possible 
> value can be obtained by
>   ulimit -Hs
> It seems that Mac OS X may not be as generous and enforces a hard 
> limit of 16 MB [3].
> Note that resource limits are set on a per-process basis and are 
> inherited by child processes. Changing it this way will apply, for 
> example, to a Python shell started subsequently within the same shell 
> session, but changes are lost once the session is terminated.
> (2) It is also possible to change resource limits directly from within 
> Python:
>   import resource
>   resource.setrlimit(resource.RLIMIT_STACK, (-1, -1))
> will have the same effect as the first shell command above, i.e. 
> lifting the stack size limit.
> Background: I have experienced program crashes while trying to confirm 
> that some road graphs from the 9th DIMACS challenge [1] are indeed 
> strongly connected by transforming them into an EdgeList-compatible 
> format and using NetworKit's Python interface to compute SCCs.
> MWE using a DIMACS graph (reproducible on e.g. compute3.iti.kit.edu 
> <http://compute3.iti.kit.edu>):
> - shell command for fetching CAL instance and converting it to an edge 
> list:
> curl 
> http://www.dis.uniroma1.it/challenge9/data/USA-road-t/USA-road-t.CAL.gr.gz 
> | \
>   zcat | awk '/^a / { print $2 " " $3 " " $4 }' >USA-road-t.CAL.wel
> - interactive Python3 session:
>   >>> from networkit import *
>   >>> gr = readGraph("USA-road-t.CAL.wel", fileformat=Format.EdgeList, 
> separator=" ", \
>   ... firstNode=1, commentPrefix="c", continuous=True, directed=True)
>   >>> s = components.StronglyConnectedComponents(gr)
>   >>> s.run()
>   Segmentation fault (core dumped)
> With a web graph from the SNAP dataset repository:
> - fetching dataset:
>   curl http://snap.stanford.edu/data/web-Google.txt.gz | zcat 
> >web-Google.txt
> - interactive Python3 session:
>   >>> from networkit import *
>   >>> gr = readGraph("web-Google.txt", 
> fileformat=Format.EdgeListTabZero, \
>   ... commentPrefix="#", continuous=True, directed=True)
>   >>> s = components.StronglyConnectedComponents(gr)
>   >>> s.run()
>   Segmentation fault (core dumped)
> Both examples run fine with the stack limit removed as shown above.
> The attached patch (scc-iter.patch, against current Dev 
> [5750:e0903081deb1]) is an experimental attempt to work around limited 
> stack size by employing an iterative adaption of the existing 
> implementation in StronglyConnectedComponents::run(). It provides the 
> alternative method runIteratively(), which uses an additional 
> std::vector-based stack instead of the program call stack. I am 
> sharing it here for the curious and interested -- it has not been 
> tested thoroughly, and it currently relies on an additional interface 
> function hacked into the Graph class (namely Graph::getIthNeighbor) 
> for obtaining the i-th outgoing neighbor of a node in constant time.
> It seems to work fine on the two instances mentioned above and is 
> slightly faster (about 10%) for the strongly connected road graph (one 
> component) than the recursive baseline and slightly slower (again 
> about 10%) for the web graph, for which roughly half of its nodes are 
> in one SCC, while most others build components of size 1.  Again, no 
> extensive evaluation has been performed, but these preliminary 
> findings line up with my intuition that call-stack recursion performs 
> very well because it is based on a dedicated hardware register/memory 
> layout and does not require explicit boundary checks or 
> initialization, not to mention compiler optimizations that may take 
> advantage of (partial) function inlining. The overhead of storing 
> additional return addresses and padding stack frames seems to show 
> only for a larger recursion depth.
> Regards
> Obada
> [1] http://stackoverflow.com/questions/2656722/
> [2] http://www.dis.uniroma1.it/challenge9/download.shtml
> [3] http://stackoverflow.com/questions/13245019/
> [4] 
> https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20160725/c129bbe5/attachment.html>

More information about the NetworKit mailing list