swh-loader-git#
The Software Heritage Git Loader is a tool and a library to walk a local Git repository and inject into the SWH dataset all contained files that weren’t known before.
The main entry points are:
- swh.loader.git.loader.GitLoaderfor the main loader which can ingest either local or remote git repository’s contents. This is the main implementation deployed in production.
- swh.loader.git.from_disk.GitLoaderFromDiskwhich ingests only local git clone repository.
- swh.loader.git.loader.GitLoaderFromArchivewhich ingests a git repository wrapped in an archive.
- swh.loader.git.directory.GitCheckoutLoaderwhich ingests a git tree at a specific commit, branch or tag.
License#
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public License along with this program.
Dependencies#
### Runtime
- python3 
- python3-dulwich 
- python3-retrying 
- python3-swh.core 
- python3-swh.model 
- python3-swh.storage 
- python3-swh.scheduler 
### Test
- python3-nose 
Requirements#
- implementation language, Python3 
- coding guidelines: conform to PEP8 
- Git access: via dulwich 
CLI Run#
You can run the loader from a remote origin (loader) or from an origin on disk (from_disk) directly by calling:
swh loader -C <config-file> run git <git-repository-url>
or “git_disk”.
## Configuration sample
/tmp/git.yml:
storage:
  cls: remote
  args:
    url: http://localhost:5002/