swh.lister.utils module#
- swh.lister.utils.split_range(total_pages: int, nb_pages: int) Iterator[Tuple[int, int]][source]#
- Split total_pages into mostly nb_pages ranges. In some cases, the last range can have one more element. - >>> list(split_range(19, 10)) [(0, 9), (10, 19)] - >>> list(split_range(20, 3)) [(0, 2), (3, 5), (6, 8), (9, 11), (12, 14), (15, 17), (18, 20)] - >>> list(split_range(21, 3)) [(0, 2), (3, 5), (6, 8), (9, 11), (12, 14), (15, 17), (18, 21)] 
- swh.lister.utils.is_valid_origin_url(url: str | None) bool[source]#
- Returns whether the given string is a valid origin URL. This excludes Git SSH URLs and pseudo-URLs (eg. - ssh://git@example.org:fooand- git@example.org:foo), as they are not supported by the Git loader and usually require authentication.- All HTTP URLs are allowed: - >>> is_valid_origin_url("http://example.org/repo.git") True >>> is_valid_origin_url("http://example.org/repo") True >>> is_valid_origin_url("https://example.org/repo") True >>> is_valid_origin_url("https://foo:bar@example.org/repo") True - Scheme-less URLs are rejected; - >>> is_valid_origin_url("example.org/repo") False >>> is_valid_origin_url("example.org:repo") False - Git SSH URLs and pseudo-URLs are rejected: - >>> is_valid_origin_url("git@example.org:repo") False >>> is_valid_origin_url("ssh://git@example.org:repo") False 
- exception swh.lister.utils.ArtifactNatureUndetected[source]#
- Bases: - ValueError- Raised when a remote artifact’s nature (tarball, file) cannot be detected. 
- exception swh.lister.utils.ArtifactNatureMistyped[source]#
- Bases: - ValueError- Raised when a remote artifact is neither a tarball nor a file. - Error of this type are’ probably a misconfiguration in the manifest generation that badly typed a vcs repository. 
- exception swh.lister.utils.ArtifactWithoutExtension[source]#
- Bases: - ValueError- Raised when an artifact nature cannot be determined by its name. 
- swh.lister.utils.url_contains_tarball_filename(urlparsed, extensions: List[str], raise_when_no_extension: bool = True) bool[source]#
- Determine whether urlparsed contains a tarball filename ending with one of the extensions passed as parameter, path parts and query parameters are checked. - This also account for the edge case of a filename with only a version as name (so no extension in the end.) - Raises:
- ArtifactWithoutExtension in case no extension is available and – 
- raise_when_no_extension is True (the default) – 
 
 
- swh.lister.utils.is_tarball(urls: List[str], request: Any | None = None) Tuple[bool, str][source]#
- Determine whether a list of files actually are tarball or simple files. - This iterates over the list of urls provided to detect the artifact’s nature. When this cannot be answered simply out of the url and - requestis provided, this executes a HTTP HEAD query on the url to determine the information. If request is not provided, this raises an ArtifactNatureUndetected exception.- If, at the end of the iteration on the urls, no detection could be deduced, this raises an ArtifactNatureUndetected. - Parameters:
- urls – name of the remote files to check for artifact nature. 
- request – (Optional) Request object allowing http calls. If not provided and naive check cannot detect anything, this raises ArtifactNatureUndetected. 
 
- Raises:
- ArtifactNatureUndetected when the artifact's nature cannot be detected out – of its urls 
- ArtifactNatureMistyped when the artifact is not a tarball nor a file. It's up to – the caller to do what’s right with it. 
 
 - Returns: A tuple (bool, url). The boolean represents whether the url is an archive
- or not. The second parameter is the actual url once the head request is issued as a fallback of not finding out whether the urls are tarballs or not.