swh.loader.core.utils module#
- swh.loader.core.utils.clean_dangling_folders(dirpath: str, pattern_check: str, log=None) None[source]#
- Clean up potential dangling temporary working folder rooted at dirpath. Those
folders must match a dedicated pattern and not belonging to a live pid.
- Parameters:
dirpath – Path to check for dangling files
pattern_check – A dedicated pattern to check on first level directory (e.g swh.loader.mercurial., swh.loader.svn.)
log (Logger) – Optional logger
- swh.loader.core.utils.clone_with_timeout(src: str, dest: str, clone_func: Callable[[], None], timeout: float) None[source]#
Clone a repository with timeout.
- Parameters:
src – clone source
dest – clone destination
clone_func – callable that does the actual cloning
timeout – timeout in seconds
- swh.loader.core.utils.parse_visit_date(visit_date: datetime | str | None) datetime | None[source]#
Convert visit date from either None, a string or a datetime to either None or datetime.
- swh.loader.core.utils.compute_hashes(filepath: str, hash_names: List[str] = ['sha256']) Dict[str, str][source]#
Compute checksums dict out of a filepath
- swh.loader.core.utils.compute_nar_hashes(filepath: Path, hash_names: List[str] = ['sha256'], is_tarball=True, top_level=True) Dict[str, str][source]#
Compute nar checksums dict out of a filepath (tarball or plain file).
If it’s a tarball, this uncompresses the tarball in a temporary directory to compute the nar hashes (and then cleans it up).
- Parameters:
filepath – The tarball (if is_tarball is True) or a filepath
hash_names – The list of checksums to compute
is_tarball – Whether filepath represents a tarball or not
top_level – Whether we want to compute the top-level directory (of the tarball) hashes. This is only useful when used with ‘is_tarball’ at True.
- Returns:
The dict of checksums values whose keys are present in hash_names.
- swh.loader.core.utils.get_url_body(url: str, **extra_params) bytes[source]#
Basic HTTP client to retrieve information on software package, typically JSON metadata from a REST API.
- Parameters:
url (str) – An HTTP URL
- Raises:
NotFound in case of query failures (for some reasons – 404, …)
- Returns:
The associated response’s information
- swh.loader.core.utils.download(url: str, dest: str, hashes: Dict = {}, filename: str | None = None, auth: Tuple[str, str] | None = None, extra_request_headers: Dict[str, str] | None = None, timeout: int = 120) Tuple[str, Dict][source]#
Download a remote file from url, and compute swh hashes on it.
- Parameters:
url – Artifact uri to fetch and hash
dest – Directory to write the archive to
hashes – Dict of expected hashes (key is the hash algo) for the artifact to download (those hashes are expected to be hex string). The supported algorithms are defined in the
swh.model.hashutil.ALGORITHMSset.auth – Optional tuple of login/password (for http authentication service, e.g. deposit)
extra_request_headers – Optional dict holding extra HTTP headers to be sent with the request
timeout – Value in seconds so the connection does not hang indefinitely (read/connection timeout)
- Raises:
ValueError in case of any error when fetching/computing (length, –
checksums mismatched...) –
- Returns:
Tuple of local (filepath, hashes of filepath)