Software Heritage Foundation
Published: 12 Dec 2021
From the website of the Software Heritage Foundation:
Our ambition is to collect, preserve, and share all software that is publicly available in source code form. On this foundation, a wealth of applications can be built, ranging from cultural heritage to industry and research.
Basically, the goal is to fetch as much as possible code from all types, including:
- Source code repositories (e.g. CVS, SVN, Git, Mercurial).
- Software forges (e.g. GitHub, Eclipse, BitBucket, GitLab).
- Language ecosystems (e.g. NPM, PyPi, CPAN, Maven)
- OS distributions (e.g. Debian, Fedora, Ubuntu)
They provide full access to the archive, with a powerful API and a comfortable GUI, as well as direct downloads. They also convert all the source code artefacts into a single and universal data structure, an enormous Merkle directed acyclic graph [Merkle, 1987], available from cloud services or from a direct download as well -- it's 11TB as of August 2022.
I contributed two connectors in 2021:
-
The Maven ecosystem connector, that lists and imports all available source files and repositories from a Maven repository like the Maven Central Repository. It was funded by an Alfred P. Sloan grant.
-
The Tuleap forge connector, that lists all source repositories from a Tuleap Instance like the official instance.