[Networkit] How to read in a large graph (and output a sparse matrix)
Raphael C
drraph at gmail.com
Tue Aug 2 17:26:27 CEST 2016
I still haven't managed to solve my problem.
As I mentioned below:
G= networkit.graphio.readGraph("edges.txt",
networkit.Format.EdgeList, separator=" ", continuous=False)
uses too much RAM on my 8GB machine. To get round this I wrote code
to translate the labels of the nodes into consecutive integers
starting at 1. Now
G= networkit.graphio.readGraph("edges-contig.txt",
networkit.Format.EdgeListSpaceOne,
separator=" ", continuous=True)
works fine. However if I try to write out the adjcacency matrix with
networkit.graphio.writeMat(G,"test.mat")
it hugely increases the RAM usage and then runs out memory. I tested
it with a smaller graph and it seems that writeMat uses more than 3
times the RAM of the graph itself.
Is there anything else I can try? I simply want to read in a graph and
output it in any sparse adjacency matrix which scipy can read.
Raphael
On 1 August 2016 at 20:08, Raphael C <drraph at gmail.com> wrote:
> Thank you for this.
>
> I have 8GB of RAM and I have a simple edge list text file of size
> 1.2GB. It was 62500000 edges and about half that many vertices. Each
> line looks like
>
> 002512524 000991414
>
> That is it is two 9 digit numbers representing an edge.
>
> In principle this graph should fit more than comfortably in 8GB of RAM.
>
> I would like to read in the graph and output a sparse adjacency
> matrix. I am failing on all counts. I am now trying
>
> G= networkit.graphio.readGraph("edges-tenmill.txt",
> networkit.Format.EdgeList, separator=" ", continuous=False)
>
> but uses all the RAM and then crashes.
>
> To understand the RAM usage I tried the same thing with only 20
> million edges and 10 million vertices.
>
> /usr/bin/time -v python3 ./test.py
>
> gives
>
> Maximum resident set size (kbytes): 2098684
>
> The following code makes a fake data set that can be used to reproduce
> the problem.
>
> import random
>
> #Number of edges, vertices
> m = 20000000
> #m = 62500000
> n = m/2
>
> for i in xrange(m):
> fromnode = str(random.randint(0, n-1)).zfill(9)
> tonode = str(random.randint(0, n-1)).zfill(9)
> print fromnode, tonode
>
> It seems that behind the scenes in the C code of networkit something
> is taking up a lot of RAM.
>
> Raphael
>
> On 1 August 2016 at 09:24, Maximilian Vogel
> <maximilian.vogel at student.kit.edu> wrote:
>> Hi Raphael,
>>
>> you imported networkit with "import networkit", I suppose. If that's the
>> case, you need to specify the format with networkit before as well:
>> "networkit.Format.EdgeListSpaceOne".
>>
>> And while we are at it, some more hints: If the predefined edge list formats
>> do not fit your needs, you can adjust it easily, for which you'll need
>> "Format.EdgeList":
>>
>> commentPrefix='#': In case there are comments in your file, use this
>> parameter to specify the first character of comment lines
>> firstNode=0: Specify the first node id (assuming the are continuous)
>> continuous=False/True: If set to true, the reader expects numbers as node
>> ids and the graph object will have n = maxNodeId - firstNode nodes. If set
>> to False, node ids can be anything and the graph object will have as many
>> nodes as there are unique node ids.
>> directed=False/True: Specify if the edges should be interpreted as directed
>> or not.
>>
>> As for writing the graph, I'm not too familiar with the Matrix formats.
>> Maybe networkit.graphio.writeMat is what you're looking for.
>>
>> Hope this helps,
>> Max
>>
>>
>> On 01.08.2016 10:05, Raphael C wrote:
>>
>> This is my first attempt to use networkit.
>>
>> I have a simple edge list text file of size
>> 1.2GB. It was 62500000 edges and about half that many vertices. Each
>> line looks like
>>
>> 287111206 357850135
>>
>> I would like to read in the graph and output a sparse adjacency
>> matrix.
>>
>> I am using python and have networkit 4.1.1 which seems to be the
>> latest version available through pip.
>>
>> I tried to find the right function in the docs. I have attempted.
>>
>> G = networkit.readGraph("dgraph.edgelist", fileformat = "EdgeListSpaceOne")
>> G = networkit.readGraph("dgraph.
>> edgelist", Format.EdgeList)
>>
>> But both return error messages. In fact the latter returns
>>
>> NameError: name 'Format' is not defined
>>
>> How do you read in a graph?
>>
>> Raphael
>>
>> _______________________________________________
>> NetworKit mailing list
>> NetworKit at ira.uni-karlsruhe.de
>> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
>>
>>
>>
>> _______________________________________________
>> NetworKit mailing list
>> NetworKit at ira.uni-karlsruhe.de
>> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
>>
More information about the NetworKit
mailing list