swh.indexer.storage.in_memory module#
- swh.indexer.storage.in_memory.check_id_types(data: List[Dict[str, Any]])[source]#
- Checks all elements of the list have an ‘id’ whose type is ‘bytes’. 
- class swh.indexer.storage.in_memory.SubStorage(row_class: Type[TValue], tools, journal_writer)[source]#
- Bases: - Generic[- TValue]- Implements common missing/get/add logic for each indexer type. - missing(keys: Iterable[Dict]) List[bytes][source]#
- List data missing from storage. - Parameters:
- data (iterable) – - dictionaries with keys: - id (bytes): sha1 identifier 
- indexer_configuration_id (int): tool used to compute the results 
 
- Yields:
- missing sha1s 
 
 - get(ids: Iterable[bytes]) List[TValue][source]#
- Retrieve data per id. - Parameters:
- ids (iterable) – sha1 checksums 
- Yields:
- dict – - dictionaries with the following keys: - id (bytes) 
- tool (dict): tool used to compute metadata 
- arbitrary data (as provided to add) 
 
 
 - get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str][source]#
- Retrieve ids of content with indexer_type within partition partition_id bound by limit. - Parameters:
- **indexer_type** – Type of data content to index (mimetype, etc…) 
- **indexer_configuration_id** – The tool used to index data 
- **partition_id** – index of the partition to fetch 
- **nb_partitions** – total number of partitions to split into 
- **page_token** – opaque token used for pagination 
- **limit** – Limit result (default to 1000) 
- **with_textual_data** (bool) – Deal with only textual content (True) or all content (all contents by defaults, False) 
 
- Raises:
- IndexerStorageArgumentException for; – 
- - limit to None – 
- - wrong indexer_type provided – 
 
- Returns:
- PagedResult of Sha1. If next_page_token is None, there is no more data to fetch 
 
 
- class swh.indexer.storage.in_memory.IndexerStorage(journal_writer=None)[source]#
- Bases: - object- In-memory SWH indexer storage. - content_mimetype_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str][source]#
 - content_fossology_license_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str][source]#
 - directory_intrinsic_metadata_add(metadata: List[DirectoryIntrinsicMetadataRow]) Dict[str, int][source]#
 - origin_intrinsic_metadata_search_fulltext(conjunction: List[str], limit: int = 100) List[OriginIntrinsicMetadataRow][source]#