swh.loader.svn.loader module#
Loader in charge of injecting either new or existing svn mirrors to swh-storage.
- class swh.loader.svn.loader.SvnLoader(storage: StorageInterface, url: str, origin_url: str | None = None, visit_date: datetime | None = None, incremental: bool = True, temp_directory: str = '/tmp', debug: bool = False, check_revision: int = 0, check_revision_from: int = 0, **kwargs: Any)[source]#
- Bases: - BaseLoader- SVN loader. The repository is either remote or local. The loader deals with update on an already previously loaded repository. - Load a svn repository (either remote or local). - Parameters:
- url – The default origin url 
- origin_url – Optional original url override to use as origin reference in the archive. If not provided, “url” is used as origin. 
- visit_date – Optional date to override the visit date 
- incremental – If True, the default, starts from the last snapshot (if any). Otherwise, starts from the initial commit of the repository. 
- temp_directory – The temporary directory to use as root directory for working directory computations 
- debug – If true, run the loader in debug mode. At the end of the loading, the temporary working directory is not cleaned up to ease inspection. Defaults to false. 
- check_revision – The number of svn commits between checks for hash divergence 
 
 - swh_revision_hash_tree_at_svn_revision(revision: int) Directory[source]#
- Compute and return the hash tree at a given svn revision. - Parameters:
- rev – the svn revision we want to check 
- Returns:
- The hash tree directory as bytes. 
 
 - build_swh_revision(rev: int, commit: Dict, dir_id: bytes, parents: Sequence[bytes]) Revision[source]#
- Build the swh revision dictionary. - This adds: - the ‘synthetic’ flag to true 
- the ‘extra_headers’ containing the repository’s uuid and the svn revision number. 
 - Parameters:
- rev – the svn revision number 
- commit – the commit data: revision id, date, author, and message 
- dir_id – the upper tree’s hash identifier 
- parents – the parents’ identifiers 
 
- Returns:
- The swh revision corresponding to the svn revision. 
 
 - check_history_not_altered(revision_start: int, swh_rev: Revision) bool[source]#
- Given a svn repository, check if the history was modified in between visits. 
 - start_from() Tuple[int, int][source]#
- Determine from where to start the loading. - Returns:
- tuple (revision_start, revision_end) 
- Raises:
- SvnLoaderHistoryAltered – When a hash divergence has been detected (should not happen) 
- SvnLoaderUneventful – Nothing changed since last visit 
 
 
 - process_svn_revisions(svnrepo, revision_start, revision_end) Iterator[Tuple[List[Content], List[SkippedContent], List[Directory], Revision]][source]#
- Process svn revisions from revision_start to revision_end. - At each svn revision, apply new diffs and simultaneously compute swh hashes. This yields those computed swh hashes as a tuple (contents, directories, revision). - Note that at every self.check_revision, a supplementary check takes place to check for hash-tree divergence (related T570). - Yields:
- tuple (contents, directories, revision) of dict as a dictionary with keys, sha1_git, sha1, etc… 
- Raises:
- ValueError in case of a hash divergence detection – 
 
 - prepare()[source]#
- Second step executed by the loader to prepare some state needed by
- the loader. 
- Raises
- NotFound exception if the origin to ingest is not found. 
 
 - fetch_data()[source]#
- Fetching svn revision information. - This will apply svn revision as patch on disk, and at the same time, compute the swh hashes. - In effect, fetch_data fetches those data and compute the necessary swh objects. It’s then stored in the internal state instance variables (initialized in _prepare_state). - This is up to store_data to actually discuss with the storage to store those objects. - Returns:
- True to continue fetching data (next svn revision), False to stop. 
- Return type:
 
 - store_data()[source]#
- We store the data accumulated in internal instance variable. If the iteration over the svn revisions is done, we create the snapshot and flush to storage the data. - This also resets the internal instance variable state. 
 - generate_and_load_snapshot(revision: Revision | None = None, snapshot: Snapshot | None = None) Snapshot[source]#
- Create the snapshot either from existing revision or snapshot. - Revision (supposedly new) has priority over the snapshot (supposedly existing one). 
 - load_status()[source]#
- Detailed loading status. - Defaults to logging an eventful load. - Returns: a dictionary that is eventually passed back as the task’s
- result to the scheduler, allowing tuning of the task recurrence mechanism. 
 
 - post_load(success: bool = True) None[source]#
- Permit the loader to do some additional actions according to status after the loading is done. The flag success indicates the loading’s status. - Defaults to doing nothing. - This is up to the implementer of this method to make sure this does not break. - Parameters:
- success (bool) – the success status of the loading 
 
 
- class swh.loader.svn.loader.SvnLoaderFromDumpArchive(storage: StorageInterface, url: str, archive_path: str, origin_url: str | None = None, incremental: bool = False, visit_date: datetime | None = None, temp_directory: str = '/tmp', debug: bool = False, check_revision: int = 0, **kwargs: Any)[source]#
- Bases: - SvnLoader- Uncompress an archive containing an svn dump, mount the svn dump as a local svn repository and load that repository. - Load a svn repository (either remote or local). - Parameters:
- url – The default origin url 
- origin_url – Optional original url override to use as origin reference in the archive. If not provided, “url” is used as origin. 
- visit_date – Optional date to override the visit date 
- incremental – If True, the default, starts from the last snapshot (if any). Otherwise, starts from the initial commit of the repository. 
- temp_directory – The temporary directory to use as root directory for working directory computations 
- debug – If true, run the loader in debug mode. At the end of the loading, the temporary working directory is not cleaned up to ease inspection. Defaults to false. 
- check_revision – The number of svn commits between checks for hash divergence 
 
 
- class swh.loader.svn.loader.SvnLoaderFromRemoteDump(storage: StorageInterface, url: str, origin_url: str | None = None, incremental: bool = True, visit_date: datetime | None = None, temp_directory: str = '/tmp', debug: bool = False, check_revision: int = 0, **kwargs: Any)[source]#
- Bases: - SvnLoader- Create a subversion repository dump out of a remote svn repository (using the svnrdump utility). Then, mount the repository locally and load that repository. - Load a svn repository (either remote or local). - Parameters:
- url – The default origin url 
- origin_url – Optional original url override to use as origin reference in the archive. If not provided, “url” is used as origin. 
- visit_date – Optional date to override the visit date 
- incremental – If True, the default, starts from the last snapshot (if any). Otherwise, starts from the initial commit of the repository. 
- temp_directory – The temporary directory to use as root directory for working directory computations 
- debug – If true, run the loader in debug mode. At the end of the loading, the temporary working directory is not cleaned up to ease inspection. Defaults to false. 
- check_revision – The number of svn commits between checks for hash divergence 
 
 - get_last_loaded_svn_rev(svn_url: str) int[source]#
- Check if the svn repository has already been visited and return the last loaded svn revision number or -1 otherwise. 
 - dump_svn_revisions(svn_url: str, last_loaded_svn_rev: int = -1) Tuple[str, int][source]#
- Generate a compressed subversion dump file using the svnrdump tool and gzip. If the svnrdump command failed somehow, the produced dump file is analyzed to determine if a partial loading is still feasible. - Raises:
- NotFound when the repository is no longer found at url – 
- Returns:
- The dump_path of the repository mounted and the max dumped revision number (-1 if all revisions were dumped)