[Networkit] All pairs shortest path
matteo at cs.brown.edu
Wed Dec 14 16:50:50 CET 2016
> On Dec 13, 2016, at 6:14 AM, Christian Staudt <christian.staudt at kit.edu> wrote:
> * ps. I recently had a discussion with the CEO of a company that trains software developers in agile methods. His claim was that especially projects in C++ tend to accumulate technical debt (unfixed bugs, architecture problems etc.) until the project grinds to a halt. He said that’s because the language is challenging and tool support for refactoring is bad (both is true). I defended C++ with the example of NetworKit, which has not gone bankrupt in this respect, but had to concede that it was lots of manual work to keep the technical debt in check. In the end we agreed that C++ projects can evolve nicely, but it requires very good programmers and someone always pushing for refactoring. So yeah, a bit of technical debt here.
Let me play the devil’s advocate here, and say that NetworKit doesn’t have the technical debt in check.
It’s a bold statement and even I only believe it partially because I know only a few aspects of the NetworKit code.
I’m a bit strong in the following statements, but I want to remark that AI actually like NK and the people around it quite a bit =). I’m doing this because it can probably start an healthy discussion, and I hope you take it as such. =)
NetworKit doesn’t have the technical debt in check for multiple reasons:
1) The API organization in modules is a bit of a mess. E.g.: why are SSSP and related algorithms in the Graph module and not in the Distances module? Why is the Structure module under the NetworKit namespace and not under the Aux? What is the Global module? And so on.
2) The documentation for the C++ version is in an abysmal state =). Many classes and methods are not well documented (or missing completely), and often the documentation is insufficient or wrong. For example:
The documentation for NetworKit::APSP::getDistances() says that it "Returns a vector of weighted distances from the source node, i.e. the length of the shortest path from the source node to any other node.” . This description is actually not true, because the method returns a vector of vectors. It looks a lot like a cut and paste error. Or for NetworKit::APSP::getDistances(), the documentation says: "Returns all shortest paths from source to t and an empty set if source and t are not connected.”. Apart from the fact that the prototype uses “u” and “v” for the source and target (and not “source” and “t”), this description is also wrong, because the method returns an edgeweight, i.e., likely the shortest path distance from u to v.
Even for the “main” class NetworKit::Graph, the documentation is insufficient. For example for Graph::parallelSumForEdges(L handle), there is no documentation of whether ‘handle’ takes any parameters and what types of parameter. I do not know anything about the python API because I don’t use it.
3) Naming of classes and methods is a bit all over the places. In 4.2 the approximation algorithms in the distance group were renamed from ApproxNAME to NAMEApproximation, but this is not consistent with other modules, e.g., Centrality where we still have ApproxNAME (without going into the non-descriptive ApproxBetweenness and ApproxBetweenness2).
4) Documentation for the design and implementation is nowhere to be found (maybe is in a paper? I admit I haven’t looked a lot). But for example, I don’t actually know the internals of the Graph structure.
5) It’s my understanding that if one starts removing and adding nodes or edges to a graph then BadThingsHappens(TM) and the side effects are not well tested. I may be wrong.
6) Very limited support for dynamic graphs. Even some algorithms developed by the NK group are still not in NK (e.g., the Dynamic Betweenness centrality algorithm).
Good work to everyone to make NetworKit even better! I really appreciate your work.
More information about the NetworKit