Software Heritage Foundation

From the website of the Software Heritage Foundation:

Our ambition is to collect, preserve, and share all software that is publicly available in source code form. On this foundation, a wealth of applications can be built, ranging from cultural heritage to industry and research.

Basically, the goal is to fetch as much as possible code from all types, including:

  • Source code repositories (e.g. CVS, SVN, Git, Mercurial).
  • Software forges (e.g. GitHub, Eclipse, BitBucket, GitLab).
  • Language ecosystems (e.g. NPM, PyPi, CPAN, Maven)
  • OS distributions (e.g. Debian, Fedora, Ubuntu)
SWH logo

They provide full access to the archive, with a powerful API and a comfortable GUI, as well as direct downloads. They also convert all the source code artefacts into a single and universal data structure, an enormous Merkle directed acyclic graph [Merkle, 1987], available from cloud services or from a direct download as well -- it's 11TB as of August 2022.

I contributed two connectors in 2021: