[Networkit] Read simple edgelist

Maximilian Vogel maximilian.vogel at student.kit.edu
Mon Sep 28 21:27:41 CEST 2015


Hi,


On 28.09.2015 20:33, Jérôme Deschênes wrote:
> I am trying to import an EdgeList (360,000 nodes and 9,000,000 edges). 
> This is not a huge network by any mean but somehow a large one.
> [...]
> I run Networkit on an Ubuntu (64 bits) guest within a Windows  7 (64 
> bits) using Virtualbox. The computer has 32GB of Ram but the virtual 
> machine only has access to 24GB (which I can increase if necessary) .
>
> I have been able to read this exact same file in NetworkX  (as well as 
> in SNAP, Gephi and Cytoscape) on the same virtual machine. So, my 
> assumption is that I have enough RAM to store the whole file at once.
Yes, it definitely should possible for you to read that graph.

> The form of the file is this one :
>
> 1234 55342
> 1234 23232
> 1234 33324
> 2455 324525
> 2455 242525
> [...]
> _NetworKit.pyx in _NetworKit.EdgeListReader.read 
> (networkit/_NetworKit.cpp:14467)()
>
> MemoryError: std::bad_alloc
This error basically says that while reading the graph, not enough 
memory was available.
The node ids of the file, are they continuous (if you have n nodes, the 
node ids are 0, 1, ..., n-1)?
 From your example and the error I assume, that the aren't continuous, 
therefor you can try:
networkit.graphio.readGraph("myfile.txt", Format.EdgeList, separator=" 
", continuous=False)

Some background information: The EdgeListReader assumes that the node 
ids are continuous which means that if there is a very large number in 
your file, the reader tries to initialize a graph with that amount of 
nodes. But the EdgeListReader also supports any kind of node ids (also 
strings) and initializes the graph with amount of different node ids it 
encountered. This mode can by triggered by passing continuous=False. 
Currently, passing additional parameters only works with the generic 
Format.EdgeList specifier and thus, you need to specify the seperator 
character as well (and firstNode if continuous=True).


> During handling of the above exception, another exception occurred:
>
> OSError                                   Traceback (most recent call 
> last)
> <ipython-input-4-21eac3b4172f> in <module>()
> ----> 1 G = nkit.graphio.readGraph("myfile.txt",Format.EdgeListSpaceZero)
>
> /usr/local/lib/python3.4/dist-packages/networkit/graphio.py in 
> readGraph(path, fileformat, **kwargs)
>   119                                 return G
>   120                         except Exception as e:
> --> 121                                 raise IOError("{0} is not a 
> valid {1} file: {2}".format(path,fileformat,e))
>   122         return None
>   123
>
> OSError: /myfile.txt is not a valid Format.EdgeListSpaceZero file: 
> std::bad_alloc
This is just a not so adequate error message...

> I want to use Networkit to speed up the process of computing 
> centrality measures as the above programs tend to use only one core 
> for calculations (even when asked not to). As I am using a workstation 
> with 16 cores and 32 threads, they only use about 3% of computing 
> power...
Let me quote Christian Staudt who recently clarified the current state 
of some of the centrality measures: "You are right to expect 
EigenvectorCentrality to run in parallel. [...] 
However, ApproxBetweenness2 and ApproxCloseness are currently not 
parallelized - work in progress."

Hope this helps,
Max
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20150928/57ac61d7/attachment-0001.html>


More information about the NetworKit mailing list