[Networkit] Linux resource limits and call-stack-based recursion in NetworKit algorithms

Obada Mahdi omahdi at gmail.com
Fri Jul 15 13:59:20 CEST 2016

Hi everyone,

following up on my brief comment during the GALA project presentations
yesterday, I would like to share my insight here for reference.

In short, most Linux-based distributions (including the one used on our
computeNN.iti.kit.edu servers) set per-process resource limits in a way
that restricts stack size to a maximum of 8 MB of memory by default [1],
which is reached quickly when running recursive algorithms like NetworKit's
StronglyConnectedComponents on instances like e.g. DIMACS road graphs [2].
Exceeding this limit causes programs to crash with a "segmentation fault".

Since NetworKit is targeted at large-scale network analysis, I assume that
the default limit of 8 MB stack size on most Linux-based machines is too
restrictive in general and may require user interaction to lift it. Given
the rather unspecific error message ("Segmentation fault", program crash),
I suggest to indicate the use of stack-based recursion in the NetworKit
documentation (e.g. for StronglyConnectedComponents::run()) and pointing
out possible workarounds for affected platforms.

For completeness, here are two ways a user can adjust the stack size limit:

(1) From the command line ("bash" shell), the stack resource limit can be
lifted via

  ulimit -s unlimited

if no hard limit is set. If a hard limit exists, the maximum possible value
can be obtained by

  ulimit -Hs

It seems that Mac OS X may not be as generous and enforces a hard limit of
16 MB [3].

Note that resource limits are set on a per-process basis and are inherited
by child processes. Changing it this way will apply, for example, to a
Python shell started subsequently within the same shell session, but
changes are lost once the session is terminated.

(2) It is also possible to change resource limits directly from within

  import resource
  resource.setrlimit(resource.RLIMIT_STACK, (-1, -1))

will have the same effect as the first shell command above, i.e. lifting
the stack size limit.

Background: I have experienced program crashes while trying to confirm that
some road graphs from the 9th DIMACS challenge [1] are indeed strongly
connected by transforming them into an EdgeList-compatible format and using
NetworKit's Python interface to compute SCCs.

MWE using a DIMACS graph (reproducible on e.g. compute3.iti.kit.edu):

- shell command for fetching CAL instance and converting it to an edge list:

| \
  zcat | awk '/^a / { print $2 " " $3 " " $4 }' >USA-road-t.CAL.wel

- interactive Python3 session:

  >>> from networkit import *
  >>> gr = readGraph("USA-road-t.CAL.wel", fileformat=Format.EdgeList,
separator=" ", \
  ... firstNode=1, commentPrefix="c", continuous=True, directed=True)
  >>> s = components.StronglyConnectedComponents(gr)
  >>> s.run()
  Segmentation fault (core dumped)

With a web graph from the SNAP dataset repository:

- fetching dataset:
  curl http://snap.stanford.edu/data/web-Google.txt.gz | zcat

- interactive Python3 session:

  >>> from networkit import *
  >>> gr = readGraph("web-Google.txt", fileformat=Format.EdgeListTabZero, \
  ... commentPrefix="#", continuous=True, directed=True)
  >>> s = components.StronglyConnectedComponents(gr)
  >>> s.run()
  Segmentation fault (core dumped)

Both examples run fine with the stack limit removed as shown above.

The attached patch (scc-iter.patch, against current Dev
[5750:e0903081deb1]) is an experimental attempt to work around limited
stack size by employing an iterative adaption of the existing
implementation in StronglyConnectedComponents::run(). It provides the
alternative method runIteratively(), which uses an additional
std::vector-based stack instead of the program call stack. I am sharing it
here for the curious and interested -- it has not been tested thoroughly,
and it currently relies on an additional interface function hacked into the
Graph class (namely Graph::getIthNeighbor) for obtaining the i-th outgoing
neighbor of a node in constant time.

It seems to work fine on the two instances mentioned above and is slightly
faster (about 10%) for the strongly connected road graph (one component)
than the recursive baseline and slightly slower (again about 10%) for the
web graph, for which roughly half of its nodes are in one SCC, while most
others build components of size 1.  Again, no extensive evaluation has been
performed, but these preliminary findings line up with my intuition that
call-stack recursion performs very well because it is based on a dedicated
hardware register/memory layout and does not require explicit boundary
checks or initialization, not to mention compiler optimizations that may
take advantage of (partial) function inlining. The overhead of storing
additional return addresses and padding stack frames seems to show only for
a larger recursion depth.



[1] http://stackoverflow.com/questions/2656722/
[2] http://www.dis.uniroma1.it/challenge9/download.shtml
[3] http://stackoverflow.com/questions/13245019/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20160715/1a5e4532/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scc-iter.patch
Type: text/x-patch
Size: 7227 bytes
Desc: not available
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20160715/1a5e4532/attachment.bin>

More information about the NetworKit mailing list