[Networkit] Matrix classes & separation of algorithm and data structure

Kolja Esders kolja.esders at student.kit.edu
Mon Jan 26 18:41:40 CET 2015


After further consideration these are my thoughts:

The link prediction algorithms should do one thing really well and that is
the actual calculation of the scores. That should be the purpose of the
corresponding classes. This might even involve storing byproducts that
would get calculated either way*. But following the separation of concerns
I don't think that the algorithm classes should manage and possibly
regulate the access to the actual result by acting as an container.
This would mean that the runAll() method should actually return a
result-container (e.g. a matrix) instead of storing it in the algorithm
class and providing access through getters. This would solve the problem
described in my initial mail and seems to be more client-friendly and
intuitive. The same should be the case for the run(node u, node v) method
which calculates the score for a single node-pair. This method should
return the generated score (double).
The problem which arises is the inconsistency with the basic scheme of an
algorithm in networkit which usually owns the generated results and
contains a void run() method for calculation and a getter for specific
access.

The advantages of this approach would be
+ No need to allocate unnecessary memory if you want to use the algorithm
(initial problem)
+ More intutive for the client and less calls to get an actual result
+ Caching could be done by simply wrapping the algorithm around a
designated Cache-class
+ More freedom for further data-manipulation (multiplying the result matrix
with some constant etc.)
+ Clear separation of concerns

What are your thoughts on this? Are there other good reasons to avoid
returning the generated results apart from consistency? Is the "void
run()"-approach actually better class design?

Best regards,
Kolja

* An example would be the calculation of a score for a node-pair (u,v)
where the algorithm needs to calculate the scores between u and all
neighbors of v to arrive at the score for (u, v).

PS: I hope the mailing list recognized this mail as an response to my
original entry as I've manually edited my mailman digest to respond to the
entry instead of the digest.

Date: Wed, 21 Jan 2015 17:17:15 +0100
> From: Kolja Esders <kolja.esders at student.kit.edu>
> To: networkit at ira.uni-karlsruhe.de
> Subject: [Networkit] Matrix classes & separation of algorithm and data
>         structure
> Message-ID:
>         <CADQhVyGacJfi1Y=
> ti0STahSERrFediN-ReVbyv8Qg3G-DpEr3A at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
>
> Hi NetworKit developers,
>
> as part of my bachelor thesis I currently implement a number of link
> prediction algorithms for NetworKit.
>
> During this effort I encountered that the regular Matrix class is not well
> suited to store the resulting scores for all algorithms as the similarity
> matrices are oftentimes very dense or need only the storage space of an
> triangular matrix due to symmetry.
> Consequently I've implemented a vector-based SymmetricSimilarityMatrix
> which stores a triangular-matrix in a vector allowing for fast access and
> less memory usage.
> This approach collides with the Matrix classes already present because the
> new class can't inherit from the current Matrix class which is based on a
> graph-implementation.
> To solve this problem I would propose a common interface "Matrix" which
> simply defines a set of pure virtual functions which could then be
> implemented from a SparseMatrix class (the current Matrix class) and the
> SymmetricSimilarityMatrix as well.
> What do you guys think about this approach? Are there better ways to solve
> this?
>
> Another problem is the separation of algorithm and the used data structure.
> Right now every algorithm takes a graph as an input and outputs a matrix
> containing the scores for every node-pair. The matrix is stored in the
> class itself and gets allocated during construction.
> Now in order to make use of one algorithm (e.g. common neighbors [CN]) in
> another algorithm (e.g. Jaccard's coefficent using CN) this would mean to
> allocate twice the needed memory (2 matrices, one per class).
> What would be the best approach to separate algorithm and data structure? I
> would suggest that the constructor (or maybe the run method) takes a
> reference to the output matrix to fill with scores. Thus the class wouldn't
> own the matrix.
>
> Please note if the above explanation is not clear enough.
> I would appreciate any feedback!
>
> Best regards,
> Kolja
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20150121/34fa361c/attachment-0001.html
> >
>
> ------------------------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20150126/c03ecdf6/attachment.html>


More information about the NetworKit mailing list