[Networkit] How to read in a large graph (and output a sparse matrix)

Raphael C drraph at gmail.com
Thu Aug 11 12:05:38 CEST 2016


Thanks very much. I am away from the relevant computer for a few weeks but
I look forward to hearing anything you find from your explorations.

It seems that this simple task breaks all the graph libraries I have found.

Raphael

On 11 Aug 2016 10:22, "Maximilian Vogel" <maximilian.vogel at student.kit.edu>
wrote:

> Hi Raphael,
>
> Sorry, for the late response.
>
> On 02.08.2016 17:26, Raphael C wrote:
>
> G= networkit.graphio.readGraph("edges.txt",
> networkit.Format.EdgeList, separator=" ", continuous=False)
>
> uses too much RAM on my 8GB machine.  To get round this I wrote code
> to translate the labels of the nodes into consecutive integers
> starting at 1.
>
> Well, that's actually what the reader should do when passing
> continuous=False. I'm quite surprised that 8GB are not sufficient.
>
>  Now
>
> G= networkit.graphio.readGraph("edges-contig.txt",
>                                networkit.Format.EdgeListSpaceOne,
> separator=" ", continuous=True)
>
> works fine.  However if I try to write out the adjcacency matrix with
>
> networkit.graphio.writeMat(G,"test.mat")
>
> it hugely increases the RAM usage and then runs out memory. I tested
> it with a smaller graph and it seems that writeMat uses more than 3
> times the RAM of the graph itself.
>
> Is there anything else I can try? I simply want to read in a graph and
> output it in any sparse adjacency matrix which scipy can read.
>
> graphio.writeMat relies on the algebraic module. I've checked the source
> and it appears to use scipy for the adjacency matrix representation[1] as
> well as the file output[2]. The conversion process definitely requires more
> RAM as you'll end up with the NetworKit graph object as well as a
> scipy.sparse.lil_matrix object. As I'm not too familiar with scipy, I can't
> estimate how much RAM this object as well as the output routine requires.
>
> [1]: https://algohub.iti.kit.edu/parco/NetworKit/NetworKit/files/
> f2b49f8e57dac7c4a963953b0d5620d299ad4d1e/networkit/algebraic.py#L25
> [2]: https://algohub.iti.kit.edu/parco/NetworKit/NetworKit/files/
> f2b49f8e57dac7c4a963953b0d5620d299ad4d1e/networkit/graphio.py#L204
>
> On 1 August 2016 at 20:08, Raphael C <drraph at gmail.com> <drraph at gmail.com> wrote:
>
> Thank you for this.
>
> I have 8GB of RAM and I have a simple edge list text file of size
> 1.2GB. It was 62500000 edges and about half that many vertices. Each
> line looks like
>
>      002512524 000991414
>
> That is it is two 9 digit numbers representing an edge.
>
> In principle this graph should fit more than comfortably in 8GB of RAM.
>
> A node id is represented by a 64bit integer in NetworKit. Your input with
> 62500000 edges would at least require 1GB of RAM if stored as an edge list
> and if I'm not mistaken. The adjacency array representation (for undirected
> graphs) should require about the same amount plus some overhead. So yeah,
> 8GB of RAM should be more than enough.
>
> I would like to read in the graph and output a sparse adjacency
> matrix. I am failing on all counts.  I am now trying
>
> G= networkit.graphio.readGraph("edges-tenmill.txt",
> networkit.Format.EdgeList, separator=" ", continuous=False)
>
> but uses all the RAM and then crashes.
>
> To understand the RAM usage I tried the same thing with only 20
> million edges and 10 million vertices.
>
> /usr/bin/time -v python3 ./test.py
>
> gives
>
> Maximum resident set size (kbytes): 2098684
>
> The following code makes a fake data set that can be used to reproduce
> the problem.
>
> import random
>
> #Number of edges, vertices
> m = 20000000
> #m = 62500000
> n = m/2
>
> for i in xrange(m):
>     fromnode = str(random.randint(0, n-1)).zfill(9)
>     tonode = str(random.randint(0, n-1)).zfill(9)
>     print fromnode, tonode
>
> It seems that behind the scenes in the C code of networkit something
> is taking up a lot of RAM.
>
> Thanks for the script. I'll do some tests myself.
>
> Best,
> Max
>
>
> _______________________________________________
> NetworKit mailing list
> NetworKit at ira.uni-karlsruhe.de
> https://lists.ira.uni-karlsruhe.de/mailman/listinfo/networkit
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20160811/23950782/attachment.html>


More information about the NetworKit mailing list