[Networkit] Edge Prediction

Christian Staudt christian.staudt at kit.edu
Fri Feb 6 17:42:10 CET 2015

Hi Kolja,

here are some comments on the issues you asked about. Also a design proposal for the edge prediction code. And a couple of comments after looking at your code.

- Don’t work in the default branch, this is for released code only. Create a branch for your project and merge the Dev branch into it regularly. When your project is finished, we merge it into Dev and finally into default for release. Please try to fix the version history by removing your commits from the default branch, start with a clean copy of the repository if necessary.

- Use the same indentation style as other parts of NetworKit.

- Do not iterate over the nodes manually, it’s messy and bug-prone. Use the iterators provided by the Graph class. If they are for some reason inadequate, discuss that with the developers on the list.

- Whichever design you go for, it has to work with the Python layer. This restricts your options. Start coding Python wrappers for your classes early.

- Here’s my comment on your design: Basically forget about O(n^2) time and O(n^2) memory calculations. We should not suggest to the user that they should attempt this, because it doesn’t work for the large networks we target. I therefore suggest to forget about matrices. Here is the design concept I propose:

Implement a collection of proximity measures. They derive from a common superclass and have a very simple interface: two nodes in, one value out. No storage. This code can be reused in various ways, not just for edge prediction.

Now, the actual EdgePredictor class can make use of these ProximityMeasure subclasses. In the end, the user wants a list of predicted edges. Either the user specifies a subset of nodes to check, or the EdgePredictor should have a (exchangeable?) pair selection heuristic to restrict calculation of proximity scores to a subset of node pairs in the graph (e.g. nodes that have at most distance 2 from each other). The EdgePredictor should also contain a heuristic to look at the scored pairs and decide which of them should be an actual edge.

class EdgePredictor:
	EdgePredictor(Graph, set[node] = {}, ...)	

	vector[pair[node]] getEdges()

- There is an obvious connection between distance measures in graphs and similarity measures used for link prediction. Both associate a  value with a pair of nodes. It seems to me that there should be one class hierarchy for both to avoid code redundancy. Please try to work out a unified solution. Enlist the help of the developers on the email list. Make sure that they can look at your code.

Does that make sense to you?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20150206/3c198abe/attachment.sig>

More information about the NetworKit mailing list