swh.web.client.client module#
Python client for the Software Heritage Web API
Light wrapper around requests for the archive API, taking care of data conversions and pagination.
from swh.web.client.client import WebAPIClient
cli = WebAPIClient()
# retrieve any archived object via its SWHID
cli.get('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')
# same, but for specific object types
cli.revision('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')
# get() always retrieve entire objects, following pagination
# WARNING: this might *not* be what you want for large objects
cli.get('swh:1:snp:6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a')
# type-specific methods support explicit iteration through pages
next(cli.snapshot('swh:1:snp:cabcc7d7bf639bbe1cc3b41989e1806618dd5764'))
- swh.web.client.client.typify_json(data: Any, obj_type: str) Any[source]#
- Type API responses using pythonic types where appropriate - The following conversions are performed: - identifiers are converted from strings to SWHID instances 
- timestamps are converted from strings to datetime.datetime objects 
 
- class swh.web.client.client.WebAPIClient(api_url: str = 'https://archive.softwareheritage.org/api/1', bearer_token: str | None = None, request_retry=10, retry_status={429}, use_rate_limit: bool = True, automatic_concurrent_queries: bool = True, max_automatic_concurrency: int | None = None)[source]#
- Bases: - object- Client for the Software Heritage archive Web API, see https://archive.softwareheritage.org/api/ - Create a client for the Software Heritage Web API - See: https://archive.softwareheritage.org/api/ - Parameters:
- api_url – base URL for API calls 
- bearer_token – optional bearer token to do authenticated API calls 
- use_rate_limit – enable or disable request pacing according to server rate limit information. 
- automatic_concurrent_queries – if - True, some large requests that need to be chunked might automatically be issued in parallel
- max_automatic_concurrency – maximum number of concurrent requests when - automatic_concurrent_queriesis set
 
 - With rate limiting enabled (the default), the client will adjust its request rate if the server provides Rate limiting headers. - The rate limiting will pace out the available requests evenly in the rate limit windows. (except for a small initial budget as explained below) - For example, if there is 600 request remaining for a windows that reset in 5 minutes (300 second), a request will be issuable every 0.5 seconds. - This pace will be enforced overall, allowing for period of inactivity between faster spike. - For example (using the same number as above): - A client that tries to issue requests continuously will have to wait 0.5 second between each requests. 
- A client that did not issue requests for 1 minutes (60 seconds) will be able to issue 120 requests right away (60 / 0.5) before having to wait 0.5 second between requests. 
 - The above is true regardless of the number of threads using the same WebAPIClient. - In practice, to avoid slowing down small application doing few requests, 10% of the available budget is available immediately, the other 90% of the requests being spread out over the rate limit window.o - This initial “immediate” budget is only granted if at least 25% of the total request budget is available. - DEFAULT_AUTOMATIC_CONCURENCY = 20#
 - property rate_limit_delay#
- current rate limit delay in second 
 - get(swhid: CoreSWHID | str, typify: bool = True, **req_args) Any[source]#
- Retrieve information about an object of any kind - Dispatcher method over the more specific methods content(), directory(), etc. - Note that this method will buffer the entire output in case of long, iterable output (e.g., for snapshot()), see the iter() method for streaming. 
 - iter(swhid: CoreSWHID | str, typify: bool = True, **req_args) Iterator[Dict[str, Any]][source]#
- Stream over the information about an object of any kind - Streaming variant of get() 
 - content(swhid: CoreSWHID | str, typify: bool = True, **req_args) Dict[str, Any][source]#
- Retrieve information about a content object - Parameters:
- swhid – object persistent identifier 
- typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True) 
- req_args – extra keyword arguments for requests.get() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - directory(swhid: CoreSWHID | str, typify: bool = True, **req_args) List[Dict[str, Any]][source]#
- Retrieve information about a directory object - Parameters:
- swhid – object persistent identifier 
- typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True) 
- req_args – extra keyword arguments for requests.get() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - revision(swhid: CoreSWHID | str, typify: bool = True, **req_args) Dict[str, Any][source]#
- Retrieve information about a revision object - Parameters:
- swhid – object persistent identifier 
- typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True) 
- req_args – extra keyword arguments for requests.get() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - release(swhid: CoreSWHID | str, typify: bool = True, **req_args) Dict[str, Any][source]#
- Retrieve information about a release object - Parameters:
- swhid – object persistent identifier 
- typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True) 
- req_args – extra keyword arguments for requests.get() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - snapshot(swhid: CoreSWHID | str, typify: bool = True, **req_args) Iterator[Dict[str, Any]][source]#
- Retrieve information about a snapshot object - Parameters:
- swhid – object persistent identifier 
- typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True) 
- req_args – extra keyword arguments for requests.get() 
 
- Returns:
- an iterator over partial snapshots (dictionaries mapping branch names to information about where they point to), each containing a subset of available branches 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - visits(origin: str, per_page: int | None = None, last_visit: int | None = None, typify: bool = True, **req_args) Iterator[Dict[str, Any]][source]#
- List visits of an origin - Parameters:
- origin – the URL of a software origin 
- per_page – the number of visits to list 
- last_visit – visit to start listing from 
- typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True) 
- req_args – extra keyword arguments for requests.get() 
 
- Returns:
- an iterator over visits of the origin 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - last_visit(origin: str, typify: bool = True) Dict[str, Any][source]#
- Return the last visit of an origin. - Parameters:
- origin – the URL of a software origin 
- typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True) 
 
- Returns:
- The last visit for that origin 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - known(swhids: Iterable[CoreSWHID | str], **req_args) Dict[CoreSWHID, Dict[Any, Any]][source]#
- Verify the presence in the archive of several objects at once - Parameters:
- swhids – SWHIDs of the objects to verify 
- Returns:
- a dictionary mapping object SWHIDs to archive information about them; the dictionary includes a “known” key associated to a boolean value that is true if and only if the object is known to the archive 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - content_exists(swhid: CoreSWHID | str, **req_args) bool[source]#
- Check if a content object exists in the archive - Parameters:
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.head() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - directory_exists(swhid: CoreSWHID | str, **req_args) bool[source]#
- Check if a directory object exists in the archive - Parameters:
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.head() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - revision_exists(swhid: CoreSWHID | str, **req_args) bool[source]#
- Check if a revision object exists in the archive - Parameters:
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.head() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - release_exists(swhid: CoreSWHID | str, **req_args) bool[source]#
- Check if a release object exists in the archive - Parameters:
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.head() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - snapshot_exists(swhid: CoreSWHID | str, **req_args) bool[source]#
- Check if a snapshot object exists in the archive - Parameters:
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.head() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - origin_exists(origin: str, **req_args) bool[source]#
- Check if an origin object exists in the archive - Parameters:
- origin – the URL of a software origin 
- req_args – extra keyword arguments for requests.head() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - content_raw(swhid: CoreSWHID | str, **req_args) Iterator[bytes][source]#
- Iterate over the raw content of a content object - Parameters:
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.get() 
 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - origin_search(query: str, limit: int | None = None, with_visit: bool = False, **req_args) Iterator[Dict[str, Any]][source]#
- List origin search results - Parameters:
- query – search keywords 
- limit – the maximum number of found origins to return 
- with_visit – if true, only return origins with at least one visit 
 
- Returns:
- an iterator over search results 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - origin_save(visit_type: str, origin: str) Dict[source]#
- Save code now query for the origin with visit_type. - Parameters:
- visit_type – Type of the visit 
- origin – the origin to save 
 
- Returns:
- The resulting dict of the visit saved 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - get_origin(swhid: CoreSWHID) Any | None[source]#
- Walk the compressed graph to discover the origin of a given swhid - This method exist for the swh-scanner and is likely to change significantly and/or be replaced, we do not recommend using it. 
 - cooking_request(bundle_type: str, swhid: CoreSWHID | str, email: str | None = None, **req_args) Dict[str, Any][source]#
- Request a cooking of a bundle - Parameters:
- bundle_type – Type of the bundle 
- swhid – object persistent identifier 
- email – e-mail to notify when the archive is ready 
- req_args – extra keyword arguments for requests.post() 
 
- Returns:
- fetch_url (string): the url from which to download the archive progress_message (string): message describing the cooking task progress id (number): the cooking task id status (string): the cooking task status (new/pending/done/failed) swhid (string): the identifier of the object to cook 
- Return type:
- an object containing the following keys 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - cooking_check(bundle_type: str, swhid: CoreSWHID | str, **req_args) Dict[str, Any][source]#
- Check the status of a cooking task - Parameters:
- bundle_type – Type of the bundle 
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.get() 
 
- Returns:
- fetch_url (string): the url from which to download the archive progress_message (string): message describing the cooking task progress id (number): the cooking task id status (string): the cooking task status (new/pending/done/failed) swhid (string): the identifier of the object to cook 
- Return type:
- an object containing the following keys 
- Raises:
- requests.HTTPError – if HTTP request fails 
 
 - cooking_fetch(bundle_type: str, swhid: CoreSWHID | str, **req_args) Response[source]#
- Fetch the archive of a cooking task - Parameters:
- bundle_type – Type of the bundle 
- swhid – object persistent identifier 
- req_args – extra keyword arguments for requests.get() 
 
- Returns:
- a requests.models.Response object containing a stream of the archive 
- Raises:
- requests.HTTPError – if HTTP request fails