[Networkit] Cython memory management

Kolja Esders kolja.esders at student.kit.edu
Sun Apr 5 21:24:03 CEST 2015


Hi fellow developers,

as I am currently writing a link prediction extension for NetworKit I
discovered some weird behaviour regarding Cython memory management where
there is way more memory allocated than necessary.

In general this is no news since Cython copies objects to make them
available on the Python level. See Chris' stackoverflow post
<http://stackoverflow.com/questions/20123696/cython-how-to-move-large-objects-without-copying-them>
.

But there are 2 curious things in my case:

   - On the C++ level a specific method (MissingLinksFinder.findAll
   <https://algohub.iti.kit.edu/parco/NetworKit/NetworKit-Esders/files/3464cac6d673d44c60b880d9f08cb50720b86441/networkit/cpp/linkprediction/MissingLinksFinder.cpp#L17>(2))
   of mine returns a vector that has a total size of ~124 MiB. Cython in
   return allocates a total of 1.1 GiB of memory. Even regarding
   vector-specific over-allocation this is way beyond what's expected.***
   - Using std::move in the python code
   <https://algohub.iti.kit.edu/parco/NetworKit/NetworKit-Esders/files/3464cac6d673d44c60b880d9f08cb50720b86441/networkit/_NetworKit.pyx#L5158>
(as
   is done for Graph etc.) yields no improvement.


You can have a look at a test that I added
<https://algohub.iti.kit.edu/parco/NetworKit/NetworKit-Esders/files/3464cac6d673d44c60b880d9f08cb50720b86441/networkit/cpp/linkprediction/test/LinkPredictionGTest.cpp#L166>
to
demonstrate the C++ memory usage and compare that to the memory consumed
during execution of the following Python snippet:

from networkit import *
%matplotlib inline
cd ~/some_folders/NetworKit-Esders # Edit this line accordingly
testGraph = readGraph("input/caidaRouterLevel.graph", Format.METIS)
trainingGraph =
linkprediction.TrainingGraphGenerator.byPercentage(testGraph, 0.7)
missingLinks = linkprediction.MissingLinksFinder(trainingGraph).findAll(2)
# Around 1.1 GiB
commonNeighborsIndex = linkprediction.CommonNeighborsIndex(trainingGraph)
predictions = commonNeighborsIndex.runOnParallel(missingLinks) # Around 1.7
GiB

What are your thoughts on this? Do you encounter the same problem?

Any recommendations regarding my interface are also appreciated.

Let me know if you need more information.

Cheers,
Kolja

** *Same is true for LinkPredictor::runOnParallel
<https://algohub.iti.kit.edu/parco/NetworKit/NetworKit-Esders/files/3464cac6d673d44c60b880d9f08cb50720b86441/networkit/cpp/linkprediction/LinkPredictor.cpp#L36>
(should
be ~186 MiB and actually is ~1.7 GiB)

PS: I am using the newest Cython version.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20150405/bf1d593d/attachment.html>


More information about the NetworKit mailing list