[Networkit] Request for comments: How to read attributes in graph files?

Maximilian Vogel maximilian.vogel at student.kit.edu
Wed Sep 2 12:12:26 CEST 2015


Hi,


On 01.09.2015 15:28, Christian Staudt wrote:
> As far as I see, parsing attributes has been neglected in our GraphReader classes, but if you think about it, why bother with all these network analysis methods if you cannot easily connect the results to attributes? So we should definitely improve this.
Parsing attributes has been neglected partly because there was hardly 
any interest and partly because there is no "infrastructure" or 
convenient way yet to connect results to attributes - a little bit of a 
hen and egg problem.
I agree that we should improve this and some work has already been done 
in that regard:
- The EdgeListReader supports reading of arbitrary node ids. A map 
["file node id" -> "graph internal node id"] can be retrieved via 
getNodeMap().
- Also GEXFIO has such a function.
- The attached notebook implements a few functions and shows how one 
could work with them.


> One basic idea would be to add a method like this to the readers that support attributes:
> 	GraphReader.getAttribute(key) : list[string]
> Any other ideas?
What would your getAttribute(key)-function actually do? Return a 
container of attributes created during the read(path)-function or read 
the file again for the attributes of the queried key?
A convenience function that lists the available keys might be useful as 
one usually doesn't know the available attributes (except one looked 
into the file).
A distinction between node and edge attribute might be necessary. Also, 
the data structure should be different:
- map/dictionary: arbitrary node id -> graph internal node id
- list/vector: graph internal node id -> attribute value/arbitrary node id
If a graph has a lot of deleted nodes, it's probably more efficient to 
use a map/dictionary for the second type aswell.

Regarding the GraphReader class, various approaches should be considered:
- One could argue, that you always want to have a graph, but not 
necessarily its attributes, so the read()-function reads the file and 
returns the graph object. For attributes, the getAttribute()-function 
can be used. [corresponds to your suggestion and also the current state]
- The GraphReader class gets a method process() which reads the file and 
creates all the objects like the Graph, attributes and mappings. Then, 
various getter-functions can be used to retrieve the desired objects.
Also, if you just return a list or dictionary containing your 
attributes, you have no meta information which attribute it is. You know 
it because you stated an explicit key in the getAttribute(key)-function, 
but you can't "ask" the attribute object which key it has. Maybe a 
custom attribute class that wraps the underlying container and provides 
convenience functions like getKey(), setKey(), isNodeAttribute(), 
isEdgeAttribute() or similar would be helpful.


Best regards,
Max
-------------- next part --------------
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def invertNodeMapping(nodeMap):\n",
    "    \"\"\"\n",
    "        Takes a dictionary or a list of tuples as input and \n",
    "        returns the inverted mapping assuming the values are in the range [0,length(nodeMap))\n",
    "    \"\"\"\n",
    "    invertedMapping = [None] * len(nodeMap)\n",
    "    if isinstance(nodeMap,dict):\n",
    "        iterObj = nodeMap.items()\n",
    "    elif isinstance(nodeMap,list) and isinstance(nodeMap[0],tuple):\n",
    "        iterObj = nodeMap\n",
    "    else:\n",
    "        raise TypeError(\"expected dictionary or list of tuples\")\n",
    "    for key,value in iterObj:\n",
    "        invertedMapping[value] = key\n",
    "    return invertedMapping"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def mapNodeValues(values, nodeMap):\n",
    "    \"\"\"\n",
    "        Takes a list of tuples or a dict together with a corresponding node map as input and \n",
    "        returns either a dict with the mapped node ids and results.\n",
    "    \"\"\"\n",
    "    mapped = dict()\n",
    "    if isinstance(values,dict):\n",
    "        iterObj = values.items()\n",
    "    elif isinstance(values[0],tuple):\n",
    "        iterObj = values\n",
    "    else:\n",
    "        iterObj = enumerate(values)\n",
    "    for key,val in iterObj:\n",
    "        mapped[nodeMap[key]] = val    \n",
    "    return mapped"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def mapRanking(ranking, nodeMap):\n",
    "    \"\"\"\n",
    "        Takes a ranking of nodes (i.e. a list of tuples) as input and\n",
    "        returns a ranking with substituted node ids.\n",
    "        Raises a TypeError on wrong input.\n",
    "    \"\"\"\n",
    "    if isinstance(ranking,list) and isinstance(ranking[0],tuple):\n",
    "        mappedRanking = []\n",
    "        for idx,value in ranking:\n",
    "            mappedRanking.append((nodeMap[idx],value))\n",
    "        return mappedRanking\n",
    "    else:\n",
    "        raise TypeError(\"expected list of tuples\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'76131': 4,\n",
       " 'Alice': 0,\n",
       " 'Bob': 1,\n",
       " 'Brooklyn': 6,\n",
       " 'KIT': 5,\n",
       " 'arbitrary': 2,\n",
       " 'facebook': 3}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from networkit import *\n",
    "# get a specific reader object for the 'getNodeMap()'-function to be available\n",
    "# as far as i remember, this function currently is only available for the EdgeListReader\n",
    "reader = graphio.getReader(Format.EdgeList,separator=' ',continuous=False)\n",
    "# don't forget to adjust your file path\n",
    "g = reader.read(\"../various_graphs/arbitrary.edgelist\")\n",
    "reader.getNodeMap()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "nodeMap = _"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Alice', 'Bob', 'arbitrary', 'facebook', '76131', 'KIT', 'Brooklyn']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "invertNodeMapping(nodeMap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0.0, 1.0, 2.0, 3.0, 3.0, 4.0, 5.0]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "invertedNodeMap = _\n",
    "bfs = graph.BFS(g,0).run()\n",
    "bfs.getDistances()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'76131': 3.0,\n",
       " 'Alice': 0.0,\n",
       " 'Bob': 1.0,\n",
       " 'Brooklyn': 5.0,\n",
       " 'KIT': 4.0,\n",
       " 'arbitrary': 2.0,\n",
       " 'facebook': 3.0}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# as we will see in the following two examples, mapResults can be used \n",
    "# to replace the internal node ids with the original ones:\n",
    "distances = _\n",
    "mapNodeValues(distances,invertedNodeMap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(2, 22.0), (4, 16.0), (1, 10.0), (5, 10.0), (0, 0.0), (3, 0.0), (6, 0.0)]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "betweenness = centrality.Betweenness(g).run()\n",
    "betweenness.ranking()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'76131': 16.0,\n",
       " 'Alice': 0.0,\n",
       " 'Bob': 10.0,\n",
       " 'Brooklyn': 0.0,\n",
       " 'KIT': 10.0,\n",
       " 'arbitrary': 22.0,\n",
       " 'facebook': 0.0}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ranking = _\n",
    "mapNodeValues(ranking,invertedNodeMap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('arbitrary', 22.0),\n",
       " ('76131', 16.0),\n",
       " ('Bob', 10.0),\n",
       " ('KIT', 10.0),\n",
       " ('Alice', 0.0),\n",
       " ('facebook', 0.0),\n",
       " ('Brooklyn', 0.0)]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mapRanking(ranking, invertedNodeMap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "10.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# however, if you want the result by querying with the original node id,\n",
    "# you can use the nodeMap obtained by the reader class as follows.\n",
    "betweenness.score(nodeMap[\"Bob\"])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
-------------- next part --------------
Alice Bob
arbitrary facebook
76131 KIT
arbitrary 76131
Bob arbitrary
KIT Brooklyn


More information about the NetworKit mailing list