[Networkit] Exporting subgraph to a file

Maximilian Vogel maximilian.vogel at student.kit.edu
Fri Nov 21 12:24:51 CET 2014


On 18.11.2014 11:56, Christian Staudt wrote:
> Am 18.11.2014 um 11:22 schrieb Michael Hamann <michael.hamann at kit.edu>:
>
>> actually we need the opposite, NetworKit::Subgraph currently preserves all
>> node ids and we need a way to assign new node ids (with that map from old to
>> new ids).
> Right, in an earlier version of the class the ids were not preserved. Anyway, I’m not happy with the class, we should come up with something better. The current way wastes a lot of memory for small subgraphs of large graphs.
The waste of memory becomes very obvious in the example given by user 
Isra (tx_edges_150k and the ipython notebook). The graph contains 33 
nodes and 31 edges, but the EdgeListReader initializes with highest node 
id it has found (in this case around 45M). The same applies for the 
Subgraph class (the same 45M upper node id bound vs. 9 existing nodes), 
as it copies the original graphs and removes all nodes and incident 
edges that are _not_ given in the set.
While this preserves the node ids, it wastes far too much memory. Since 
original node ids might be important for other applications, but in the 
current implementation, waste a lot of resources in networkit, I'm with 
Christian: We should come up with something better, but not only for the 
Subgraph class, but also for IO and graphs in general.
- Christian raised the idea, that the original node ids could be stored 
as node attributes. I guess this would work, but I don't know about the 
current state of node attributes in NetworKit. If the node ids are only 
important for IO and not for computation, the mapping can just be stored 
as a map or dictionary outside of the graph class. Then, only the IO 
routines would need some work to create a node mapping when reading a 
graph (which some classes do already) and the writer classes need a 
method where they take the node mapping as additional parameter.
- As far as I see it, NetworKit currently does not have a sparse graph 
data structure. Maybe this might be worth a shot? Also, a class to 
represent graph hierarchies or families also might come in handy as they 
could be implemented in such a way, that they take care of the node id 
mapping and with adapted IO routines, this would be very flexible. Some 
algorithms (e.g. PLM) work on graph hierarchies already and use similar 
structures, I think.

>> The feature that's actually missing is proper export support for graphs where
>> some node ids are missing as not all of our output formats support this and
>> there is still code in NetworKit that cannot cope with missing node ids. At
>> least we should be able to detect when a graph cannot be exported into a
>> certain graph format.
Warnings for the user if node ids won't be preserved or preventing the 
file from being written while showing an appropriate error message would 
be very nice for the user, that's true.

>> But I think at least some export formats should work already, for example I
>> cannot see any reason why GML export shouldn't work.
>>
>> @Israa: in which graph format do you want to export?
> As far as I know this problem exists only for the METIS format. I would assume that GML and GraphML work fine with deleted nodes. Can you confirm this, Israa?
I need to look into the METIS specifications again, but there might be a 
way to support this. I've also attached the GML and GraphML files of the 
subgraph in Isra's example. The node ids are preserved but the reader 
classes create a mapping when reading the file to create a small graph 
which only have as many nodes as necessary.


Best,
Max



-------------- next part --------------
A non-text attachment was scrubbed...
Name: community1.gml
Type: application/gml+xml
Size: 990 bytes
Desc: not available
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20141121/875b717c/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: community1.graphml
Type: text/xml
Size: 1497 bytes
Desc: not available
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20141121/875b717c/attachment.xml>
-------------- next part --------------
{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from networkit import *\n",
      "os.chdir(\"/home/israa/Desktop/community_detection\")\n",
      "%time G = readGraph(\"tx_edges_150k\", Format.EdgeListTabZero)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CPU times: user 504 ms, sys: 453 ms, total: 957 ms\n",
        "Wall time: 957 ms\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for n in G.nodes():\n",
      "    if G.degree(n) < 1:\n",
      "        G.removeNode(n)\n",
      "\n",
      "n = G.numberOfNodes()\n",
      "m = G.numberOfEdges()\n",
      "print(n, m)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "33 31\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%time communitiesPLM = community.detectCommunities(G, community.PLM())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "PLM(balanced,,parallel coarsening) detected communities in 2.6563539505004883 [s]\n",
        "solution properties:\n",
        "-------------------  --------\n",
        "# communities        9\n",
        "min community size   2\n",
        "max community size   9\n",
        "avg. community size  3.66667\n",
        "modularity           0.753382\n",
        "-------------------  --------"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "CPU times: user 12 s, sys: 1.13 s, total: 13.2 s\n",
        "Wall time: 3.41 s\n"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "communitiesPLM.subsetSizeMap()[2]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "9"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from networkit.graph import Subgraph\n",
      "c2 = communitiesPLM.getMembers(2)\n",
      "sg = Subgraph()\n",
      "g2 = sg.fromNodes(G,c2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(g2.numberOfNodes())\n",
      "print(g2.numberOfEdges())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "9\n",
        "14\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "communitiesPLM.getMembers(2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "{807209,\n",
        " 9080337,\n",
        " 9893919,\n",
        " 17599333,\n",
        " 23937268,\n",
        " 28210867,\n",
        " 29414451,\n",
        " 32177195,\n",
        " 38976516}"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "graphio.writeGraph(g2,'/home/israa/Desktop/community2.graphml', fileformat=Format.GraphML)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}
-------------- next part --------------
44046858	2143573
44046858	2960822
9251443	45126292
9251443	29601003
145596	15817375
15355976	15817375
145596	42941486
15355976	42941486
23235868	10998009
43698449	24443770
43698449	13716149
45358516	18951442
45358516	26465800
7411231	9241064
7411231	39096498
17599333	32177195
9893919	32177195
38976516	32177195
28210867	32177195
29414451	32177195
23937268	32177195
9080337	32177195
17599333	807209
9893919	807209
38976516	807209
28210867	807209
29414451	807209
23937268	807209
9080337	807209
33749249	43647808
33749249	4918066


More information about the NetworKit mailing list