swh.loader.core.utils module#
- swh.loader.core.utils.clean_dangling_folders(dirpath: str, pattern_check: str, log=None) None[source]#
- Clean up potential dangling temporary working folder rooted at dirpath. Those
- folders must match a dedicated pattern and not belonging to a live pid. 
 - Parameters:
- dirpath – Path to check for dangling files 
- pattern_check – A dedicated pattern to check on first level directory (e.g swh.loader.mercurial., swh.loader.svn.) 
- log (Logger) – Optional logger 
 
 
- swh.loader.core.utils.clone_with_timeout(src: str, dest: str, clone_func: Callable[[], None], timeout: float) None[source]#
- Clone a repository with timeout. - Parameters:
- src – clone source 
- dest – clone destination 
- clone_func – callable that does the actual cloning 
- timeout – timeout in seconds 
 
 
- swh.loader.core.utils.parse_visit_date(visit_date: datetime | str | None) datetime | None[source]#
- Convert visit date from either None, a string or a datetime to either None or datetime. 
- swh.loader.core.utils.compute_hashes(filepath: str, hash_names: List[str] = ['sha256']) Dict[str, str][source]#
- Compute checksums dict out of a filepath 
- swh.loader.core.utils.compute_nar_hashes(filepath: Path, hash_names: List[str] = ['sha256'], is_tarball=True, top_level=True) Dict[str, str][source]#
- Compute nar checksums dict out of a filepath (tarball or plain file). - If it’s a tarball, this uncompresses the tarball in a temporary directory to compute the nar hashes (and then cleans it up). - Parameters:
- filepath – The tarball (if is_tarball is True) or a filepath 
- hash_names – The list of checksums to compute 
- is_tarball – Whether filepath represents a tarball or not 
- top_level – Whether we want to compute the top-level directory (of the tarball) hashes. This is only useful when used with ‘is_tarball’ at True. 
 
- Returns:
- The dict of checksums values whose keys are present in hash_names. 
 
- swh.loader.core.utils.get_url_body(url: str, **extra_params) bytes[source]#
- Basic HTTP client to retrieve information on software package, typically JSON metadata from a REST API. - Parameters:
- url (str) – An HTTP URL 
- Raises:
- NotFound in case of query failures (for some reasons – 404, …) 
- Returns:
- The associated response’s information 
 
- swh.loader.core.utils.download(url: str, dest: str, hashes: Dict = {}, filename: str | None = None, auth: Tuple[str, str] | None = None, extra_request_headers: Dict[str, str] | None = None, timeout: int = 120) Tuple[str, Dict][source]#
- Download a remote file from url, and compute swh hashes on it. - Parameters:
- url – Artifact uri to fetch and hash 
- dest – Directory to write the archive to 
- hashes – Dict of expected hashes (key is the hash algo) for the artifact to download (those hashes are expected to be hex string). The supported algorithms are defined in the - swh.model.hashutil.ALGORITHMSset.
- auth – Optional tuple of login/password (for http authentication service, e.g. deposit) 
- extra_request_headers – Optional dict holding extra HTTP headers to be sent with the request 
- timeout – Value in seconds so the connection does not hang indefinitely (read/connection timeout) 
 
- Raises:
- ValueError in case of any error when fetching/computing (length, – 
- checksums mismatched...) – 
 
- Returns:
- Tuple of local (filepath, hashes of filepath)