[Networkit] Read simple edgelist

Jérôme Deschênes jeromedesch at gmail.com
Mon Sep 28 21:55:43 CEST 2015


Thank you very much Max for your kind and very complete answer.

Obviously, the discontinuous list of nodes was the problem.

I might ask one more (most probably out of order) thing regarding your last
comment:

Let me quote Christian Staudt who recently clarified the current state of
> some of the centrality measures: "You are right to expect
> EigenvectorCentrality to run in parallel. [...] However, ApproxBetweenness2
> and ApproxCloseness are currently not parallelized - work in progress."


So is there a good parallelized SNA program out there that I should
consider for a good implementation of parallalized centrality measures?



2015-09-28 15:27 GMT-04:00 Maximilian Vogel <
maximilian.vogel at student.kit.edu>:

> Hi,
>
>
> On 28.09.2015 20:33, Jérôme Deschênes wrote:
>
> I am trying to import an EdgeList (360,000 nodes and 9,000,000 edges).
> This is not a huge network by any mean but somehow a large one.
> [...]
> I run Networkit on an Ubuntu (64 bits) guest within a Windows  7 (64 bits)
> using Virtualbox. The computer has 32GB of Ram but the virtual machine only
> has access to 24GB (which I can increase if necessary) .
>
> I have been able to read this exact same file in NetworkX  (as well as in
> SNAP, Gephi and Cytoscape) on the same virtual machine. So, my assumption
> is that I have enough RAM to store the whole file at once.
>
> Yes, it definitely should possible for you to read that graph.
>
> The form of the file is this one :
>
> 1234 55342
> 1234 23232
> 1234 33324
> 2455 324525
> 2455 242525
> [...]
> _NetworKit.pyx in _NetworKit.EdgeListReader.read
> (networkit/_NetworKit.cpp:14467)()
>
> MemoryError: std::bad_alloc
>
> This error basically says that while reading the graph, not enough memory
> was available.
> The node ids of the file, are they continuous (if you have n nodes, the
> node ids are 0, 1, ..., n-1)?
> From your example and the error I assume, that the aren't continuous,
> therefor you can try:
> networkit.graphio.readGraph("myfile.txt", Format.EdgeList, separator=" ",
> continuous=False)
>
> Some background information: The EdgeListReader assumes that the node ids
> are continuous which means that if there is a very large number in your
> file, the reader tries to initialize a graph with that amount of nodes. But
> the EdgeListReader also supports any kind of node ids (also strings) and
> initializes the graph with amount of different node ids it encountered.
> This mode can by triggered by passing continuous=False. Currently,
> passing additional parameters only works with the generic Format.EdgeList
> specifier and thus, you need to specify the seperator character as well
> (and firstNode if continuous=True).
>
>
> During handling of the above exception, another exception occurred:
>
> OSError                                   Traceback (most recent call last)
> <ipython-input-4-21eac3b4172f> in <module>()
> ----> 1 G = nkit.graphio.readGraph("myfile.txt",Format.EdgeListSpaceZero)
>
> /usr/local/lib/python3.4/dist-packages/networkit/graphio.py in
> readGraph(path, fileformat, **kwargs)
>     119                                 return G
>     120                         except Exception as e:
> --> 121                                 raise IOError("{0} is not a valid
> {1} file: {2}".format(path,fileformat,e))
>     122         return None
>     123
>
> OSError: /myfile.txt is not a valid Format.EdgeListSpaceZero file:
> std::bad_alloc
>
> This is just a not so adequate error message...
>
> I want to use Networkit to speed up the process of computing centrality
> measures as the above programs tend to use only one core for calculations
> (even when asked not to). As I am using a workstation with 16 cores and 32
> threads, they only use about 3% of computing power...
>
> Let me quote Christian Staudt who recently clarified the current state of
> some of the centrality measures: "You are right to expect
> EigenvectorCentrality to run in parallel. [...] However, ApproxBetweenness2
> and ApproxCloseness are currently not parallelized - work in progress."
>
> Hope this helps,
> Max
>
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ira.uni-karlsruhe.de/mailman/private/networkit/attachments/20150928/698c1a49/attachment.html>


More information about the NetworKit mailing list