swh.storage.api.client module#
- class swh.storage.api.client.RemoteStorage(url: str, timeout: None | Tuple[float, float] | List[float] | float = None, chunk_size: int = 4096, max_retries: int = 3, pool_connections: int = 20, pool_maxsize: int = 100, adapter_kwargs: Dict[str, Any] | None = None, api_exception: Type[Exception] | None = None, reraise_exceptions: List[Type[Exception]] | None = None, enable_requests_retry: bool | None = None, **kwargs)[source]#
- Bases: - RPCClient- Proxy to a remote storage API - api_exception#
- alias of - StorageAPIError
 - backend_class#
- alias of - StorageInterface
 - reraise_exceptions: List[Type[Exception]] = [<class 'swh.storage.exc.BlockedOriginException'>, <class 'swh.storage.exc.MaskedObjectException'>, <class 'swh.storage.exc.NonRetryableException'>, <class 'swh.storage.exc.QueryTimeout'>, <class 'swh.storage.exc.StorageArgumentException'>, <class 'swh.storage.exc.UnknownMetadataAuthority'>, <class 'swh.storage.exc.UnknownMetadataFetcher'>]#
- On server errors, if any of the exception classes in this list has the same name as the error name, then the exception will be instantiated and raised instead of a generic RemoteException. 
 - extra_type_decoders: Dict[str, Callable] = {'blocking_status': <function _decode_blocking_status>, 'branch_by_name_response': <function _decode_snapshot_branch_by_name_response>, 'core_swhid': <bound method _BaseSWHID.from_string of <class 'swh.model.swhids.CoreSWHID'>>, 'extended_swhid': <bound method _BaseSWHID.from_string of <class 'swh.model.swhids.ExtendedSWHID'>>, 'identifiers_enum': <function _decode_swhids_enum>, 'masked_status': <function _decode_masked_status>, 'model': <function <lambda>>, 'model_enum': <function _decode_model_enum>, 'object_reference': <function _decode_object_reference>, 'origin_visit_with_statuses': <function _decode_origin_visit_with_statuses>, 'qualified_swhid': <bound method QualifiedSWHID.from_string of <class 'swh.model.swhids.QualifiedSWHID'>>, 'storage_enum': <function _decode_storage_enum>, 'swhids_enum': <function _decode_swhids_enum>}#
- Value of extra_decoders passed to json_loads or msgpack_loads to be able to deserialize more object types. 
 - extra_type_encoders: List[Tuple[type, str, Callable]] = [(<class 'swh.model.model.BaseModel'>, 'model', <function _encode_model_object>), (<class 'swh.model.swhids.CoreSWHID'>, 'core_swhid', <class 'str'>), (<class 'swh.model.swhids.ExtendedSWHID'>, 'extended_swhid', <class 'str'>), (<class 'swh.model.swhids.QualifiedSWHID'>, 'qualified_swhid', <class 'str'>), (<enum 'ObjectType'>, 'identifiers_enum', <function _encode_enum>), (<enum 'MetadataAuthorityType'>, 'model_enum', <function _encode_enum>), (<enum 'ListOrder'>, 'storage_enum', <function _encode_enum>), (<class 'swh.storage.interface.OriginVisitWithStatuses'>, 'origin_visit_with_statuses', <function _encode_origin_visit_with_statuses>), (<class 'swh.storage.interface.ObjectReference'>, 'object_reference', <function _encode_object_reference>), (<class 'swh.storage.interface.SnapshotBranchByNameResponse'>, 'branch_by_name_response', <function _encode_snapshot_branch_by_name_response>), (<class 'swh.storage.proxies.masking.db.MaskedStatus'>, 'masked_status', <function _encode_masked_status>), (<class 'swh.storage.proxies.blocking.db.BlockingStatus'>, 'blocking_status', <function _encode_blocking_status>)]#
- Value of extra_encoders passed to json_dumps or msgpack_dumps to be able to serialize more object types. 
 - raise_for_status(response) None[source]#
- check response HTTP status code and raise an exception if it denotes an error; do nothing otherwise 
 - clear_buffers(object_types: Sequence[str] = ()) None#
- For backend storages (pg, storage, in-memory), this is a noop operation. For proxy storages (especially filter, buffer), this is an operation which cleans internal state. 
 - content_add_metadata(content: List[Content]) Dict[str, int]#
- Add content metadata to the storage (like content_add, but without inserting to the objstorage). - Parameters:
- content (iterable) – - iterable of dictionaries representing individual pieces of content to add. Each dictionary has the following keys: - length (int): content length (default: -1) 
- one key for each checksum algorithm in - swh.model.hashutil.ALGORITHMS, mapped to the corresponding checksum
- status (str): one of visible, hidden, absent 
- reason (str): if status = absent, the reason why 
- origin (int): if status = absent, the origin we saw the content in 
- ctime (datetime): time of insertion in the archive 
 
- Returns:
- content:add: New contents added skipped_content:add: New skipped contents (no data) added 
- Return type:
- Summary dict with the following key and associated values 
 
 - content_find(content: HashDict) List[Content]#
- Find a content hash in db. - Parameters:
- content – a dictionary representing one content hash, mapping checksum algorithm names (see swh.model.hashutil.ALGORITHMS) to checksum values 
- Raises:
- ValueError – in case the key of the dictionary is not sha1, sha1_git nor sha256. 
- Returns:
- an iterable of Content objects matching the search criteria if the content exist. Empty iterable otherwise. 
 
 - content_get(contents: List[bytes], algo: str = 'sha1') List[Content | None]#
- Retrieve content metadata in bulk - Parameters:
- content – List of content identifiers 
- algo – one of the checksum algorithm in - swh.model.hashutil.DEFAULT_ALGORITHMS
 
- Returns:
- List of contents model objects when they exist, None otherwise. 
 
 - content_get_data(content: HashDict | bytes) bytes | None#
- Given a content identifier, returns its associated data if any. - Parameters:
- content – dict of hashes (or just sha1 identifier) 
- Returns:
- raw content data (bytes) 
 
 - content_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Content, str]#
- Splits contents into nb_partitions, and returns one of these based on partition_id (which must be in [0, nb_partitions-1]) - There is no guarantee on how the partitioning is done, or the result order. - Parameters:
- partition_id – index of the partition to fetch 
- nb_partitions – total number of partitions to split into 
- page_token – opaque token used for pagination. 
- limit – Limit result (default to 1000) 
 
- Returns:
- PagedResult of Content model objects within the partition. If next_page_token is None, there is no longer data to retrieve. 
 
 - content_missing(contents: List[HashDict], key_hash: str = 'sha1') Iterable[bytes]#
- List content missing from storage - Parameters:
- content – iterable of dictionaries whose keys are either ‘length’ or an item of - swh.model.hashutil.ALGORITHMS; mapped to the corresponding checksum (or length).
- key_hash – name of the column to use as hash id result (default: ‘sha1’) 
 
- Raises:
- StorageArgumentException when key_hash is unknown. – 
- TODO – an exception when we get a hash collision. 
 
- Returns:
- iterable of missing content ids (as per the key_hash column) 
 
 - content_missing_per_sha1(contents: List[bytes]) Iterable[bytes]#
- List content missing from storage based only on sha1. - Parameters:
- contents – List of sha1 to check for absence. 
- Raises:
- TODO – an exception when we get a hash collision. 
- Returns:
- Iterable of missing content ids (sha1) 
 
 - content_missing_per_sha1_git(contents: List[bytes]) Iterable[bytes]#
- List content missing from storage based only on sha1_git. - Parameters:
- contents (List) – An iterable of content id (sha1_git) 
- Yields:
- missing contents sha1_git 
 
 - content_update(contents: List[Dict[str, Any]], keys: List[str] = []) None#
- Update content blobs to the storage. Does nothing for unknown contents or skipped ones. - Parameters:
- content – - iterable of dictionaries representing individual pieces of content to update. Each dictionary has the following keys: - data (bytes): the actual content 
- length (int): content length (default: -1) 
- one key for each checksum algorithm in - swh.model.hashutil.ALGORITHMS, mapped to the corresponding checksum
- status (str): one of visible, hidden, absent 
 
- keys (list) – List of keys (str) whose values needs an update, e.g., new hash column 
 
 
 - directory_add(directories: List[Directory]) Dict[str, int]#
- Add directories to the storage - Parameters:
- directories (iterable) – - iterable of dictionaries representing the individual directories to add. Each dict has the following keys: - id (sha1_git): the id of the directory to add 
- entries (list): list of dicts for each entry in the
- directory. Each dict has the following keys: - name (bytes) 
- type (one of ‘file’, ‘dir’, ‘rev’): type of the directory entry (file, directory, revision) 
- target (sha1_git): id of the object pointed at by the directory entry 
- perms (int): entry permissions 
 
 
 
- Returns:
- directory:add: Number of directories actually added 
- Return type:
- Summary dict of keys with associated count as values 
 
 - directory_entry_get_by_path(directory: bytes, paths: List[bytes]) Dict[str, Any] | None#
- Get the directory entry (either file or dir) from directory with path. - Parameters:
- directory – directory id 
- paths – path to lookup from the top level directory. From left (top) to right (bottom). 
 
- Returns:
- The corresponding directory entry as dict if found, None otherwise. 
 
 - directory_get_entries(directory_id: bytes, page_token: bytes | None = None, limit: int = 1000) PagedResult[DirectoryEntry, str] | None#
- Get the content, possibly partial, of a directory with the given id - The entries of the directory are not guaranteed to be returned in any particular order. - The number of results is not guaranteed to be lower than the - limit.- Parameters:
- directory_id – identifier of the directory 
- page_token – opaque string used to get the next results of a search 
- limit – Number of entries to return 
 
- Returns:
- None if the directory does not exist; a page of DirectoryEntry
- objects otherwise. 
 
 - See also - swh.storage.algos.directories.directory_get()will get all entries for a given directory.- swh.storage.algos.directories.directory_get_many()will do the same for a set of directories.
 - directory_get_id_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str]#
- Splits directories into nb_partitions, and returns all the ids and raw manifests in one of these based on partition_id (which must be in [0, nb_partitions-1]). This does not return directory entries themselves; they should be retrieved using - directory_get_entries()and- directory_get_raw_manifest()instead.- There is no guarantee on how the partitioning is done, or the result order. - Parameters:
- partition_id – index of the partition to fetch 
- nb_partitions – total number of partitions to split into 
 
- Returns:
- Page of the directories’ sha1_git hashes. 
 
 - directory_get_raw_manifest(directory_ids: List[bytes]) Dict[bytes, bytes | None]#
- Returns the raw manifest of directories that do not fit the SWH data model, or None if they do. Directories missing from the archive are not returned at all. - Parameters:
- directory_ids – List of directory ids to query 
 
 - directory_ls(directory: bytes, recursive: bool = False) Iterable[Dict[str, Any]]#
- List entries for one directory. - If recursive=True, names in the path of a dir/file not at the root are concatenated with a slash (/). - Parameters:
- directory – the directory to list entries from. 
- recursive – if flag on, this list recursively from this directory. 
 
- Yields:
- directory entries for such directory. 
 
 - directory_missing(directories: List[bytes]) Iterable[bytes]#
- List directories missing from storage. - Parameters:
- directories – list of directory ids 
- Yields:
- missing directory ids 
 
 - extid_add(ids: List[ExtID]) Dict[str, int]#
- Add a series of ExtID objects - Parameters:
- ids – list of ExtID objects 
- Returns:
- Summary dict of keys with associated count as values - extid:add: New ExtID objects actually stored in db 
 
 - extid_get_from_extid(id_type: str, ids: List[bytes], version: int | None = None) List[ExtID]#
- Get ExtID objects from external IDs - Parameters:
- id_type – type of the given external identifiers (e.g. ‘mercurial’) 
- ids – list of external IDs 
- version – (Optional) version to use as filter 
 
- Returns:
- list of ExtID objects 
 
 - extid_get_from_target(target_type: ObjectType, ids: List[bytes], extid_type: str | None = None, extid_version: int | None = None) List[ExtID]#
- Get ExtID objects from target IDs and target_type - Parameters:
- target_type – type the SWH object 
- ids – list of target IDs 
- extid_type – (Optional) extid_type to use as filter. This cannot be empty if extid_version is provided. 
- extid_version – (Optional) version to use as filter. This cannot be empty if extid_type is provided. 
 
- Raises:
- ValueError if extid_version is provided without extid_type and vice versa. – 
- Returns:
- list of ExtID objects 
 
 - flush(object_types: Sequence[str] = ()) Dict[str, int]#
- For backend storages (pg, storage, in-memory), this is expected to be a noop operation. For proxy storages (especially buffer), this is expected to trigger actual writes to the backend. 
 - metadata_authority_add(authorities: List[MetadataAuthority]) Dict[str, int]#
- Add new metadata authorities to the storage. - Their type and url together are unique identifiers of this authority; and metadata is an arbitrary dict of JSONable data with information about this authority, which must not be None (but may be empty). - Parameters:
- authorities – iterable of MetadataAuthority to be inserted 
 
 - metadata_authority_get(type: MetadataAuthorityType, url: str) MetadataAuthority | None#
- Retrieve information about an authority - Parameters:
- type – one of “deposit_client”, “forge”, or “registry” 
- url – unique URI identifying the authority 
 
- Returns:
- a MetadataAuthority object (with a non-None metadata field) if it is known, else None. 
 
 - metadata_fetcher_add(fetchers: List[MetadataFetcher]) Dict[str, int]#
- Add new metadata fetchers to the storage. - Their name and version together are unique identifiers of this fetcher; and metadata is an arbitrary dict of JSONable data with information about this fetcher, which must not be None (but may be empty). - Parameters:
- fetchers – iterable of MetadataFetcher to be inserted 
 
 - metadata_fetcher_get(name: str, version: str) MetadataFetcher | None#
- Retrieve information about a fetcher - Parameters:
- name – the name of the fetcher 
- version – version of the fetcher 
 
- Returns:
- a MetadataFetcher object (with a non-None metadata field) if it is known, else None. 
 
 - object_find_by_sha1_git(ids: List[bytes]) Dict[bytes, List[Dict]]#
- Return the objects found with the given ids. - Parameters:
- ids – a generator of sha1_gits 
- Returns:
- A dict from id to the list of objects found for that id. Each object found is itself a dict with keys: - sha1_git: the input id 
- type: the type of object found 
 
 
 - object_find_recent_references(target_swhid: ExtendedSWHID, limit: int) List[ExtendedSWHID]#
- Return the SWHIDs of objects that are known to reference the object - target_swhid.- Parameters:
- target_swhid – the SWHID of the object targeted by the returned objects 
- limit – the maximum number of SWHIDs to return 
 
 - Note - The data returned by this function is by essence limited to objects that were recently added to the archive, and is pruned regularly. For completeness, one must also query - swh.graphfor backwards edges targeting the requested object.
 - object_references_add(references: List[ObjectReference]) Dict[str, int]#
- For each object reference - (source, target), record that the- sourceobject references the- targetobject (meaning that the- targetneeds to exist for the- sourceobject to be consistent within the archive).- This function will only be called internally by a reference recording proxy, through one of - directory_add(),- revision_add(),- release_add(),- snapshot_add(), or- origin_visit_status_add(). External users of- swh.storageshould not need to use this function directly.- Note - these records are inserted in time-based partitions that can be pruned when the objects are known in an up-to-date - swh.graphinstance.- Parameters:
- references – a list of - (source, target)SWHID tuples
- Returns:
- object_reference:add: the number of object references added 
- Return type:
- A summary dict with the following keys 
 
 - origin_add(origins: List[Origin]) Dict[str, int]#
- Add origins to the storage - Parameters:
- origins – - list of dictionaries representing the individual origins, with the following keys: - type: the origin type (‘git’, ‘svn’, ‘deb’, …) 
- url (bytes): the url the origin points to 
 
- Returns:
- Summary dict of keys with associated count as values - origin:add: Count of object actually stored in db 
 
 - origin_count(url_pattern: str, regexp: bool = False, with_visit: bool = False) int#
- Count origins whose urls contain a provided string pattern or match a provided regular expression. The pattern search in origin urls is performed in a case insensitive way. - Parameters:
- Returns:
- The number of origins matching the search criterion. 
- Return type:
 
 - origin_get(origins: List[str]) List[Origin | None]#
- Return origins. - Parameters:
- origin – a list of urls to find 
- Returns:
- the list of associated existing origin model objects. The unknown origins will be returned as None at the same index as the input. 
 
 - origin_get_by_sha1(sha1s: List[bytes]) List[Dict[str, Any] | None]#
- Return origins, identified by the sha1 of their URLs. - Parameters:
- sha1s – a list of sha1s 
- Returns:
- List of origins dict whose sha1 of their url match, None otherwise. 
 
 - origin_list(page_token: str | None = None, limit: int = 100) PagedResult[Origin, str]#
- Returns the list of origins - Parameters:
- page_token – opaque token used for pagination. 
- limit – the maximum number of results to return 
 
- Returns:
- Page of Origin data model objects. if next_page_token is None, there is no longer data to retrieve. 
 
 - origin_search(url_pattern: str, page_token: str | None = None, limit: int = 50, regexp: bool = False, with_visit: bool = False, visit_types: List[str] | None = None) PagedResult[Origin, str]#
- Search for origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way. - Parameters:
- url_pattern – the string pattern to search for in origin urls 
- page_token – opaque token used for pagination 
- limit – the maximum number of found origins to return 
- regexp – if True, consider the provided pattern as a regular expression and return origins whose urls match it 
- with_visit – if True, filter out origins with no visit 
- visit_types – Only origins having any of the provided visit types (e.g. git, svn, pypi) will be returned 
 
- Yields:
- PagedResult of Origin 
 
 - origin_snapshot_get_all(origin_url: str) List[bytes]#
- Return all unique snapshot identifiers resulting from origin visits. - Parameters:
- origin_url – origin URL 
- Returns:
- list of sha1s 
 
 - origin_visit_add(visits: List[OriginVisit]) Iterable[OriginVisit]#
- Add visits to storage. If the visits have no id, they will be created and assigned one. The resulted visits are visits with their visit id set. - Parameters:
- visits – List of OriginVisit objects to add 
- Raises:
- StorageArgumentException if some origin visit reference unknown origins – 
- Returns:
- List[OriginVisit] stored 
 
 - origin_visit_find_by_date(origin: str, visit_date: datetime, type: str | None = None) OriginVisit | None#
- Retrieves the origin visit whose date is closest to the provided timestamp. In case of a tie, the visit with largest id is selected. - Parameters:
- origin – origin (URL) 
- visit_date – expected visit date 
- type – filter on a specific visit type if provided 
 
- Returns:
- A visit if found, None otherwise 
 
 - origin_visit_get(origin: str, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisit, str]#
- Retrieve page of OriginVisit information. - Parameters:
- origin – The visited origin 
- page_token – opaque string used to get the next results of a search 
- order – Order on visit id fields to list origin visits (default to asc) 
- limit – Number of visits to return 
 
- Raises:
- StorageArgumentException if the order is wrong or the page_token type is – 
- mistyped. – 
 
 - Returns: Page of OriginVisit data model objects. if next_page_token is None,
- there is no longer data to retrieve. 
 - See also - swh.storage.algos.origin.iter_origin_visits()will iterate over all OriginVisits for a given origin.
 - origin_visit_get_by(origin: str, visit: int) OriginVisit | None#
- Retrieve origin visit’s information. - Parameters:
- origin – origin (URL) 
- visit – visit id 
 
- Returns:
- The information on that particular OriginVisit or None if it does not exist 
 
 - origin_visit_get_latest(origin: str, type: str | None = None, allowed_statuses: List[str] | None = None, require_snapshot: bool = False) OriginVisit | None#
- Get the latest origin visit for the given origin, optionally looking only for those with one of the given allowed_statuses or for those with a snapshot. - Parameters:
- origin – origin URL 
- type – Optional visit type to filter on (e.g git, tar, dsc, svn, 
- hg 
- npm 
- pypi 
- ...) 
- allowed_statuses – list of visit statuses considered to find the latest visit. For instance, - allowed_statuses=['full']will only consider visits that have successfully run to completion.
- require_snapshot – If True, only a visit with a snapshot will be returned. 
 
- Raises:
- StorageArgumentException if values for the allowed_statuses parameters – 
- are unknown – 
 
- Returns:
- OriginVisit matching the criteria if found, None otherwise. Note that as OriginVisit no longer held reference on the visit status or snapshot, you may want to use origin_visit_status_get_latest for those information. 
 
 - origin_visit_get_with_statuses(origin: str, allowed_statuses: List[str] | None = None, require_snapshot: bool = False, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitWithStatuses, str]#
- Retrieve page of origin visits and all their statuses. - Origin visit statuses are always sorted in ascending order of their dates. - Parameters:
- origin – The visited origin URL 
- allowed_statuses – Only visit statuses matching that list will be returned. If empty, all visit statuses will be returned. Possible status values are - created,- not_found,- ongoing,- failed,- partialand- full.
- require_snapshot – If - True, only visit statuses with a snapshot will be returned.
- page_token – opaque string used to get the next results 
- order – Order on visit objects to list (default to asc) 
- limit – Number of visits with their statuses to return 
 
 - Returns: Page of OriginVisitWithStatuses objects. if next_page_token is
- None, there is no longer data to retrieve. 
 
 - origin_visit_status_add(visit_statuses: List[OriginVisitStatus]) Dict[str, int]#
- Add origin visit statuses. - If there is already a status for the same origin and visit id at the same date, the new one will be either dropped or will replace the existing one (it is unspecified which one of these two behaviors happens). - Parameters:
- visit_statuses – origin visit statuses to add 
 - Raises: StorageArgumentException if the origin of the visit status is unknown 
 - origin_visit_status_get(origin: str, visit: int, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitStatus, str]#
- Retrieve page of OriginVisitStatus information. - Parameters:
- origin – The visited origin 
- visit – The visit identifier 
- page_token – opaque string used to get the next results of a search 
- order – Order on visit status objects to list (default to asc) 
- limit – Number of visit statuses to return 
 
 - Returns: Page of OriginVisitStatus data model objects. if next_page_token is
- None, there is no longer data to retrieve. 
 - See also - swh.storage.algos.origin.iter_origin_visit_statuses()will iterate over all OriginVisitStatus objects for a given origin and visit.
 - origin_visit_status_get_latest(origin_url: str, visit: int, allowed_statuses: List[str] | None = None, require_snapshot: bool = False) OriginVisitStatus | None#
- Get the latest origin visit status for the given origin visit, optionally looking only for those with one of the given allowed_statuses or with a snapshot. - Parameters:
- origin – origin URL 
- allowed_statuses – list of visit statuses considered to find the latest visit. Possible values are {created, ongoing, partial, full}. For instance, - allowed_statuses=['full']will only consider visits that have successfully run to completion.
- require_snapshot – If True, only a visit with a snapshot will be returned. 
 
- Raises:
- StorageArgumentException if values for the allowed_statuses parameters – 
- are unknown – 
 
- Returns:
- The OriginVisitStatus matching the criteria 
 
 - origin_visit_status_get_random(type: str) OriginVisitStatus | None#
- Randomly select one successful origin visit with <type> made in the last 3 months. - Returns:
- One random OriginVisitStatus matching the selection criteria 
 
 - raw_extrinsic_metadata_add(metadata: List[RawExtrinsicMetadata]) Dict[str, int]#
- Add extrinsic metadata on objects (contents, directories, …). - The authority and fetcher must be known to the storage before using this endpoint. - If there is already metadata for the same object, authority, fetcher, and at the same date; the new one will be either dropped or will replace the existing one (it is unspecified which one of these two behaviors happens). - Parameters:
- metadata – iterable of RawExtrinsicMetadata objects to be inserted. 
 
 - raw_extrinsic_metadata_get(target: ExtendedSWHID, authority: MetadataAuthority, after: datetime | None = None, page_token: bytes | None = None, limit: int = 1000) PagedResult[RawExtrinsicMetadata, str]#
- Retrieve list of all raw_extrinsic_metadata entries targeting the id - Parameters:
- target – the SWHID of the objects to find metadata on 
- authority – a dict containing keys type and url. 
- after – minimum discovery_date for a result to be returned 
- page_token – opaque token, used to get the next page of results 
- limit – maximum number of results to be returned 
 
- Returns:
- PagedResult of RawExtrinsicMetadata 
- Raises:
- UnknownMetadataAuthority – if the metadata authority does not exist at all 
 
 - raw_extrinsic_metadata_get_authorities(target: ExtendedSWHID) List[MetadataAuthority]#
- Returns all authorities that provided metadata on the given object. 
 - raw_extrinsic_metadata_get_by_ids(ids: List[bytes]) List[RawExtrinsicMetadata]#
- Retrieve list of raw_extrinsic_metadata entries of the given id (unlike raw_extrinsic_metadata_get, which returns metadata entries targeting the id) - Parameters:
- ids – list of hashes of RawExtrinsicMetadata objects 
 
 - release_add(releases: List[Release]) Dict[str, int]#
- Add releases to the storage - Parameters:
- releases (List[dict]) – - iterable of dictionaries representing the individual releases to add. Each dict has the following keys: - id ( - sha1_git): id of the release to add
- revision ( - sha1_git): id of the revision the release points to
- date ( - dict): the date the release was made
- name ( - bytes): the name of the release
- comment ( - bytes): the comment associated with the release
- author ( - Dict[str, bytes]): dictionary with keys: name, fullname, email
 
 - the date dictionary has the form defined in - swh.model.- Returns:
- Summary dict of keys with associated count as values - release:add: New objects contents actually stored in db 
 
 - release_get(releases: List[bytes], ignore_displayname: bool = False) List[Release | None]#
- Given a list of sha1, return the releases’s information - Parameters:
- releases – list of sha1s 
- ignore_displayname – return the original author’s full name even if it’s masked by a displayname. 
 
- Returns:
- List of releases matching the identifiers or None if the release does not exist. 
 
 - release_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Release, str]#
- Splits releases into nb_partitions, and returns one of these based on partition_id (which must be in [0, nb_partitions-1]) - There is no guarantee on how the partitioning is done, or the result order. - Parameters:
- partition_id – index of the partition to fetch 
- nb_partitions – total number of partitions to split into 
 
- Returns:
- Page of Release model objects within the partition. 
 
 - release_missing(releases: List[bytes]) Iterable[bytes]#
- List missing release ids from storage - Parameters:
- releases – release ids 
- Yields:
- a list of missing release ids 
 
 - revision_add(revisions: List[Revision]) Dict[str, int]#
- Add revisions to the storage - Parameters:
- revisions (List[dict]) – - iterable of dictionaries representing the individual revisions to add. Each dict has the following keys: - id ( - sha1_git): id of the revision to add
- date ( - dict): date the revision was written
- committer_date ( - dict): date the revision got added to the origin
- type (one of ‘git’, ‘tar’): type of the revision added 
- directory ( - sha1_git): the directory the revision points at
- message ( - bytes): the message associated with the revision
- author ( - Dict[str, bytes]): dictionary with keys: name, fullname, email
- committer ( - Dict[str, bytes]): dictionary with keys: name, fullname, email
- metadata ( - jsonb): extra information as dictionary
- synthetic ( - bool): revision’s nature (tarball, directory creates synthetic revision`)
- parents ( - list[sha1_git]): the parents of this revision
 
 - date dictionaries have the form defined in - swh.model.- Returns:
- Summary dict of keys with associated count as values - revision:add: New objects actually stored in db 
 
 - revision_get(revision_ids: List[bytes], ignore_displayname: bool = False) List[Revision | None]#
- Get revisions from storage - Parameters:
- revisions – revision ids 
- ignore_displayname – return the original author/committer’s full name even if it’s masked by a displayname. 
 
- Returns:
- list of revision object (if the revision exists or None otherwise) 
 
 - revision_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Revision, str]#
- Splits revisions into nb_partitions, and returns one of these based on partition_id (which must be in [0, nb_partitions-1]) - There is no guarantee on how the partitioning is done, or the result order. - Parameters:
- partition_id – index of the partition to fetch 
- nb_partitions – total number of partitions to split into 
 
- Returns:
- Page of Revision model objects within the partition. 
 
 - revision_log(revisions: List[bytes], ignore_displayname: bool = False, limit: int | None = None) Iterable[Dict[str, Any] | None]#
- Fetch revision entry from the given root revisions. - Parameters:
- revisions – array of root revisions to lookup 
- ignore_displayname – return the original author/committer’s full name even if it’s masked by a displayname. 
- limit – limitation on the output result. Default to None. 
 
- Yields:
- revision entries log from the given root root revisions 
 
 - revision_missing(revisions: List[bytes]) Iterable[bytes]#
- List revisions missing from storage - Parameters:
- revisions – revision ids 
- Yields:
- missing revision ids 
 
 - revision_shortlog(revisions: List[bytes], limit: int | None = None) Iterable[Tuple[bytes, Tuple[bytes, ...]] | None]#
- Fetch the shortlog for the given revisions - Parameters:
- revisions – list of root revisions to lookup 
- limit – depth limitation for the output 
 
- Yields:
- a list of (id, parents) tuples 
 
 - skipped_content_add(content: List[SkippedContent]) Dict[str, int]#
- Add contents to the skipped_content list, which contains (partial) information about content missing from the archive. - Parameters:
- contents (iterable) – - iterable of dictionaries representing individual pieces of content to add. Each dictionary has the following keys: - length (Optional[int]): content length (default: -1) 
- one key for each checksum algorithm in - swh.model.hashutil.ALGORITHMS, mapped to the corresponding checksum; each is optional
- status (str): must be “absent” 
- reason (str): the reason why the content is absent 
- origin (int): if status = absent, the origin we saw the content in 
 
- Raises:
- The following exceptions can occur – 
- - HashCollision in case of collision – 
- - Any other exceptions raise by the backend – 
- In case of errors, some content may have been stored in – 
- the DB and in the objstorage. – 
- Since additions to both idempotent, that should not be a problem. – 
 
- Returns:
- skipped_content:add: New skipped contents (no data) added 
- Return type:
- Summary dict with the following key and associated values 
 
 - skipped_content_find(content: HashDict) List[SkippedContent]#
- Find skipped content for the given hashes - Parameters:
- content – a dictionary representing one content hash, mapping checksum algorithm names (see swh.model.hashutil.ALGORITHMS) to checksum values 
- Raises:
- ValueError – in case the key of the dictionary is not sha1, sha1_git nor sha256. 
- Returns:
- a list of SkippedContent objects matching the search criteria if the skipped content exists. Empty list otherwise. 
 
 - skipped_content_missing(contents: List[Dict[str, Any]]) Iterable[Dict[str, Any]]#
- List skipped contents missing from storage. - Parameters:
- contents – iterable of dictionaries containing the data for each checksum algorithm. 
- Returns:
- Iterable of missing skipped contents as dict 
 
 - snapshot_add(snapshots: List[Snapshot]) Dict[str, int]#
- Add snapshots to the storage. - Parameters:
- snapshot ([dict]) – - the snapshots to add, containing the following keys: 
- Raises:
- ValueError – if the origin or visit id does not exist. 
- Returns:
- Summary dict of keys with associated count as values - snapshot:add: Count of object actually stored in db 
 
 - snapshot_branch_get_by_name(snapshot_id: bytes, branch_name: bytes, follow_alias_chain: bool = True, max_alias_chain_length: int = 100) SnapshotBranchByNameResponse | None#
- Get a snapshot branch by its name - Parameters:
- snapshot_id – Snapshot identifier 
- branch_name – Branch name to look for 
- follow_alias_chain – If True, find the first non alias branch. Return the first branch (alias or non alias) otherwise 
- max_alias_chain_length – Maximum number of alias chains to be followed before treating the branch as dangling. This has no significance when follow_alias_chain is False. 
 
- Returns:
- A SnapshotBranchByNameResponse object 
 
 - snapshot_count_branches(snapshot_id: bytes, branch_name_exclude_prefix: bytes | None = None) Dict[str | None, int] | None#
- Count the number of branches in the snapshot with the given id - Parameters:
- snapshot_id – snapshot identifier 
- branch_name_exclude_prefix – if provided, do not count branches whose name starts with given prefix 
 
- Returns:
- A dict whose keys are the target types of branches and values their corresponding amount 
 
 - snapshot_get(snapshot_id: bytes) Dict[str, Any] | None#
- Get the content, possibly partial, of a snapshot with the given id - The branches of the snapshot are iterated in the lexicographical order of their names. - Warning - At most 1000 branches contained in the snapshot will be returned for performance reasons. In order to browse the whole set of branches, the method - snapshot_get_branches()should be used instead.- Parameters:
- snapshot_id – snapshot identifier 
- Returns:
- a dict with three keys:
- id: identifier of the snapshot 
- branches: a dict of branches contained in the snapshot whose keys are the branches’ names. 
- next_branch: the name of the first branch not returned or - Noneif the snapshot has less than 1000 branches.
 
 
- Return type:
 
 - snapshot_get_branches(snapshot_id: bytes, branches_from: bytes = b'', branches_count: int = 1000, target_types: List[str] | None = None, branch_name_include_substring: bytes | None = None, branch_name_exclude_prefix: bytes | None = None) PartialBranches | None#
- Get the content, possibly partial, of a snapshot with the given id - The branches of the snapshot are iterated in the lexicographical order of their names. - Parameters:
- snapshot_id – identifier of the snapshot 
- branches_from – optional parameter used to skip branches whose name is lesser than it before returning them 
- branches_count – optional parameter used to restrain the amount of returned branches 
- target_types – optional parameter used to filter the target types of branch to return (possible values that can be contained in that list are ‘content’, ‘directory’, ‘revision’, ‘release’, ‘snapshot’, ‘alias’) 
- branch_name_include_substring – if provided, only return branches whose name contains given substring 
- branch_name_exclude_prefix – if provided, do not return branches whose name contains given prefix 
 
- Returns:
- a PartialBranches object listing a limited amount of branches matching the given criteria or None if the snapshot does not exist. 
 - See also - swh.storage.algos.snapshot.snapshot_get_all_branches()will get all branches for a given snapshot.
 - snapshot_get_id_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str]#
- Splits directories into nb_partitions, and returns all the ids and raw manifests in one of these based on partition_id (which must be in [0, nb_partitions-1]). This does not return directory entries themselves; they should be retrieved using - snapshot_get_branches()instead.- There is no guarantee on how the partitioning is done, or the result order. - Parameters:
- partition_id – index of the partition to fetch 
- nb_partitions – total number of partitions to split into 
 
- Returns:
- Page of the snapshots’ sha1_git hashes