[Networkit] Solving the copy vs. swap problem in Cython

Michael Hamann michael.hamann at kit.edu
Wed Aug 27 18:19:23 CEST 2014


Hi,

as I was wondering why we have these strange problems with Graph and Partition 
instances that are copied instead of move-assigned (and now in the case of the 
Graph handled as pointers) I had a closer look at what Cython generates.

I found two problems:
a) the value that is passed to setThis is in most cases an lvalue, not an 
rvalue, so copy elision is not possible
b) the Partition class had no move assignment operator because a destructor 
was defined.

While the fix for the second problem is obvious in my opinion - I simply 
deleted the destructor (please correct me if this desctructor is needed) - the 
fix for the first problem is not that obvious.

First of all, why do we get an lvalue even if the code is
return Partition().setThis(self._this.makeOneClustering(dereference(G._this)))
?

Well, the problem here is that "makeOneClustering" has "except +" in its 
declaration. This means that the function call is actually translated in the 
following C++ code (comments added by me):

  try { /* move assignment needed here */
    __pyx_t_2 = __pyx_v_self->_this.makeOneClustering((*__pyx_v_G->_this));
  } catch(...) {/* exception handler ... */ }
  __pyx_t_3 = (/*ugly cast...*/)->setThis(((struct 
__pyx_obj_9networkit_10_NetworKit_Partition *)__pyx_t_1), __pyx_t_2);

This is why the value that is passed to setThis is an lvalue whenever "except 
+" is added. Actually, every function that returns a Partition should have 
"except +", as the memory allocation could fail and thus create an exception.

Now after adding the missing "except +" statements, setThis() can be defined as 
follows:
	cdef setThis(self,  _Partition& other):
		swap[_Partition](self._this,  other)
		return self

where swap is defined as

cdef extern from "<algorithm>" namespace "std":
	void swap[T](T &a,  T &b)

According to my experiments [0], the partition is not copied anymore.

I've just pushed the changes to the Dev branch.

I guess the same could be done for Graph and Cover which means that we 
wouldn't need to use these "dereference" calls and we wouldn't need the 
extra code for Cython anymore.

Any objections?

Michael


[0]: Execute the following code and monitor the memory usage. With the fix it 
is between 3100MB and 3888MB. Without the fix it goes up to 4600MB.

In [1]: from networkit import *
Update to Python >=3.4 recommended - support for < 3.4 may be discontinued in 
the future

In [2]: G = graph.Graph(100000000)

In [3]: gen = community.ClusteringGenerator()

In [4]: while (True):
    P = gen.makeOneClustering(G)
    del P



More information about the NetworKit mailing list