[Networkit] Updates: GitHub transition, repository size

Kolja Esders kolja.esders at student.kit.edu
Sun Feb 5 19:12:15 CET 2017


Hi all,

just a couple of updates.

*GitHub Transition*

The transition to GitHub <https://github.com/kit-parco/networkit> is
finished.

In total we moved *13 repositories* to GitHub. These have been locked on
Algohub so it is no longer possible to push to them there. In the near
future we will hopefully also lock the remaining repositories as well. We
just need to make sure not to impact any lectures or student projects.

*More details:*

   - Using Travis CI with modified config from Max and Michael
   - After fixing a couple of issues, tests for g++ on linux and macos are
   passing now
   - For Clang++ there are still a couple of tests that fail
   <https://travis-ci.org/kit-parco/networkit>
   - Improved cloning time due to GitHubs infrastructure
   - The master branch has been protected so only members of PARCO will be
   able to push changes

*Next steps:*

   - Make sure all tests are passing
   - Update the website to reflect transition (news + updated references)
   - Improve the existing documentation
   - Resolve TODOs in code
   - Decrease repository size
   - Decouple website from main repository

*Please let us know in case you encounter any issues or would like to have
your repository available on GitHub as well.*


*Repository Size*

As discussed before, we are also looking at the size of the repository. The
time it takes to clone the repository has already decreased since we are
now using GitHubs infrastructure. Nevertheless the repository is still at
~380mb on disk. The main cause for this are the graphs in the *input/
directory* as well as the documentation, notebooks & website in *Doc/*.

There are multiple ways to approach this problem:

   - Keep it the way it is right now (still not a huge repository)
   - Just strip some of the graphs (not all of them are used in tests for
   example)
   - Use Git LFS <https://git-lfs.github.com/> (especially for the graphs,
   might be overkill though)
   - Move files into new repositories and link them as submodules
   - Still another option...

For some of the options this would mean a first reduction in size due to
the fact that the deleted/moved files are no longer present. An issue here
would be that Git would still keep those files internally in order to diff
back to them (revert to an older commit).

Consequently, in order to achieve a substantial reduction in size, we would
have to rewrite the history and remove those files from every commit. Since
the repository is brand new, we could probably still do it without any
major issues. Click
<https://help.github.com/articles/removing-sensitive-data-from-a-repository/>
for
details.

Personally, I would lean towards removing basically all of the graphs
except for maybe two or three small graphs (< 1mb) needed for testing the
graph reader and features that require "larger" graphs for testing.
Only reason I see for keeping graphs would be benchmarking which can
probably be done without any repository as well.
Disadvantage would be that we would have to adjust a couple of tests to
make use of another graph.

What is your opinion on this?

Best
Kolja

PS: Please shoot me an email in case you would like to join the PARCO
organization on GitHub.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ira.uni-karlsruhe.de/pipermail/networkit/attachments/20170205/48aeb0e1/attachment.html>


More information about the NetworKit mailing list