[Networkit] How to read in a large graph (and output a sparse matrix)

Maximilian Vogel maximilian.vogel at student.kit.edu
Thu Aug 11 11:22:24 CEST 2016


Hi Raphael,

Sorry, for the late response.


On 02.08.2016 17:26, Raphael C wrote:
> G= networkit.graphio.readGraph("edges.txt",
> networkit.Format.EdgeList, separator=" ", continuous=False)
>
> uses too much RAM on my 8GB machine.  To get round this I wrote code
> to translate the labels of the nodes into consecutive integers
> starting at 1.
Well, that's actually what the reader should do when passing 
continuous=False. I'm quite surprised that 8GB are not sufficient.

>   Now
>
> G= networkit.graphio.readGraph("edges-contig.txt",
>                                 networkit.Format.EdgeListSpaceOne,
> separator=" ", continuous=True)
>
> works fine.  However if I try to write out the adjcacency matrix with
>
> networkit.graphio.writeMat(G,"test.mat")
>
> it hugely increases the RAM usage and then runs out memory. I tested
> it with a smaller graph and it seems that writeMat uses more than 3
> times the RAM of the graph itself.
>
> Is there anything else I can try? I simply want to read in a graph and
> output it in any sparse adjacency matrix which scipy can read.
graphio.writeMat relies on the algebraic module. I've checked the source 
and it appears to use scipy for the adjacency matrix representation[1] 
as well as the file output[2]. The conversion process definitely 
requires more RAM as you'll end up with the NetworKit graph object as 
well as a scipy.sparse.lil_matrix object. As I'm not too familiar with 
scipy, I can't estimate how much RAM this object as well as the output 
routine requires.

[1]: 
https://algohub.iti.kit.edu/parco/NetworKit/NetworKit/files/f2b49f8e57dac7c4a963953b0d5620d299ad4d1e/networkit/algebraic.py#L25
[2]: 
https://algohub.iti.kit.edu/parco/NetworKit/NetworKit/files/f2b49f8e57dac7c4a963953b0d5620d299ad4d1e/networkit/graphio.py#L204

> On 1 August 2016 at 20:08, Raphael C<drraph at gmail.com>  wrote:
>> Thank you for this.
>>
>> I have 8GB of RAM and I have a simple edge list text file of size
>> 1.2GB. It was 62500000 edges and about half that many vertices. Each
>> line looks like
>>
>>       002512524 000991414
>>
>> That is it is two 9 digit numbers representing an edge.
>>
>> In principle this graph should fit more than comfortably in 8GB of RAM.
A node id is represented by a 64bit integer in NetworKit. Your input 
with 62500000 edges would at least require 1GB of RAM if stored as an 
edge list and if I'm not mistaken. The adjacency array representation 
(for undirected graphs) should require about the same amount plus some 
overhead. So yeah, 8GB of RAM should be more than enough.

>> I would like to read in the graph and output a sparse adjacency
>> matrix. I am failing on all counts.  I am now trying
>>
>> G= networkit.graphio.readGraph("edges-tenmill.txt",
>> networkit.Format.EdgeList, separator=" ", continuous=False)
>>
>> but uses all the RAM and then crashes.
>>
>> To understand the RAM usage I tried the same thing with only 20
>> million edges and 10 million vertices.
>>
>> /usr/bin/time -v python3 ./test.py
>>
>> gives
>>
>> Maximum resident set size (kbytes): 2098684
>>
>> The following code makes a fake data set that can be used to reproduce
>> the problem.
>>
>> import random
>>
>> #Number of edges, vertices
>> m = 20000000
>> #m = 62500000
>> n = m/2
>>
>> for i in xrange(m):
>>      fromnode = str(random.randint(0, n-1)).zfill(9)
>>      tonode = str(random.randint(0, n-1)).zfill(9)
>>      print fromnode, tonode
>>
>> It seems that behind the scenes in the C code of networkit something
>> is taking up a lot of RAM.
Thanks for the script. I'll do some tests myself.

Best,
Max

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20160811/222edafc/attachment.html>


More information about the NetworKit mailing list