swh.indexer.storage package#
Subpackages#
Submodules#
- swh.indexer.storage.converters module
- swh.indexer.storage.db module- execute_values_generator()
- Db- Db.content_mimetype_hash_keys
- Db.content_mimetype_missing_from_list()
- Db.content_mimetype_cols
- Db.mktemp_content_mimetype()
- Db.content_mimetype_add_from_temp()
- Db.content_indexer_names
- Db.content_get_range()
- Db.content_mimetype_get_from_list()
- Db.content_fossology_license_cols
- Db.mktemp_content_fossology_license()
- Db.content_fossology_license_add_from_temp()
- Db.content_fossology_license_get_from_list()
- Db.content_metadata_hash_keys
- Db.content_metadata_missing_from_list()
- Db.content_metadata_cols
- Db.mktemp_content_metadata()
- Db.content_metadata_add_from_temp()
- Db.content_metadata_get_from_list()
- Db.directory_intrinsic_metadata_hash_keys
- Db.directory_intrinsic_metadata_missing_from_list()
- Db.directory_intrinsic_metadata_cols
- Db.mktemp_directory_intrinsic_metadata()
- Db.directory_intrinsic_metadata_add_from_temp()
- Db.directory_intrinsic_metadata_get_from_list()
- Db.origin_intrinsic_metadata_cols
- Db.origin_intrinsic_metadata_regconfig
- Db.mktemp_origin_intrinsic_metadata()
- Db.origin_intrinsic_metadata_add_from_temp()
- Db.origin_intrinsic_metadata_get_from_list()
- Db.origin_intrinsic_metadata_search_fulltext()
- Db.origin_intrinsic_metadata_search_by_producer()
- Db.origin_extrinsic_metadata_cols
- Db.mktemp_origin_extrinsic_metadata()
- Db.origin_extrinsic_metadata_add_from_temp()
- Db.origin_extrinsic_metadata_get_from_list()
- Db.indexer_configuration_cols
- Db.mktemp_indexer_configuration()
- Db.indexer_configuration_add_from_temp()
- Db.indexer_configuration_get()
- Db.indexer_configuration_get_from_id()
 
 
- swh.indexer.storage.exc module
- swh.indexer.storage.in_memory module- check_id_types()
- SubStorage
- IndexerStorage- IndexerStorage.check_config()
- IndexerStorage.content_mimetype_missing()
- IndexerStorage.content_mimetype_get_partition()
- IndexerStorage.content_mimetype_add()
- IndexerStorage.content_mimetype_get()
- IndexerStorage.content_fossology_license_get()
- IndexerStorage.content_fossology_license_add()
- IndexerStorage.content_fossology_license_get_partition()
- IndexerStorage.content_metadata_missing()
- IndexerStorage.content_metadata_get()
- IndexerStorage.content_metadata_add()
- IndexerStorage.directory_intrinsic_metadata_missing()
- IndexerStorage.directory_intrinsic_metadata_get()
- IndexerStorage.directory_intrinsic_metadata_add()
- IndexerStorage.origin_intrinsic_metadata_get()
- IndexerStorage.origin_intrinsic_metadata_add()
- IndexerStorage.origin_intrinsic_metadata_search_fulltext()
- IndexerStorage.origin_intrinsic_metadata_search_by_producer()
- IndexerStorage.origin_intrinsic_metadata_stats()
- IndexerStorage.origin_extrinsic_metadata_get()
- IndexerStorage.origin_extrinsic_metadata_add()
- IndexerStorage.indexer_configuration_add()
- IndexerStorage.indexer_configuration_get()
 
 
- swh.indexer.storage.interface module- IndexerStorageInterface- IndexerStorageInterface.check_config()
- IndexerStorageInterface.content_mimetype_missing()
- IndexerStorageInterface.content_mimetype_get_partition()
- IndexerStorageInterface.content_mimetype_add()
- IndexerStorageInterface.content_mimetype_get()
- IndexerStorageInterface.content_fossology_license_get()
- IndexerStorageInterface.content_fossology_license_add()
- IndexerStorageInterface.content_fossology_license_get_partition()
- IndexerStorageInterface.content_metadata_missing()
- IndexerStorageInterface.content_metadata_get()
- IndexerStorageInterface.content_metadata_add()
- IndexerStorageInterface.directory_intrinsic_metadata_missing()
- IndexerStorageInterface.directory_intrinsic_metadata_get()
- IndexerStorageInterface.directory_intrinsic_metadata_add()
- IndexerStorageInterface.origin_intrinsic_metadata_get()
- IndexerStorageInterface.origin_intrinsic_metadata_add()
- IndexerStorageInterface.origin_intrinsic_metadata_search_fulltext()
- IndexerStorageInterface.origin_intrinsic_metadata_search_by_producer()
- IndexerStorageInterface.origin_intrinsic_metadata_stats()
- IndexerStorageInterface.origin_extrinsic_metadata_get()
- IndexerStorageInterface.origin_extrinsic_metadata_add()
- IndexerStorageInterface.indexer_configuration_add()
- IndexerStorageInterface.indexer_configuration_get()
 
 
- swh.indexer.storage.metrics module
- swh.indexer.storage.model module
- swh.indexer.storage.writer module
Module contents#
- swh.indexer.storage.sanitize_json(doc)[source]#
- Recursively replaces NUL characters, as postgresql does not allow them in text fields. 
- swh.indexer.storage.get_indexer_storage(cls: str, **kwargs) IndexerStorageInterface[source]#
- Instantiate an indexer storage implementation of class cls with arguments kwargs. - Parameters:
- cls – indexer storage class (local, remote or memory) 
- kwargs – dictionary of arguments passed to the indexer storage class constructor 
 
- Returns:
- an instance of swh.indexer.storage 
- Raises:
- ValueError if passed an unknown storage class. – 
 
- swh.indexer.storage.check_id_duplicates(data)[source]#
- If any two row models in data have the same unique key, raises a ValueError. - Values associated to the key must be hashable. - Parameters:
- data (List[dict]) – List of dictionaries to be inserted 
 - >>> tool1 = {"name": "foo", "version": "1.2.3", "configuration": {}} >>> tool2 = {"name": "foo", "version": "1.2.4", "configuration": {}} >>> check_id_duplicates([ ... ContentLicenseRow(id=b'foo', tool=tool1, license="GPL"), ... ContentLicenseRow(id=b'foo', tool=tool2, license="GPL"), ... ]) >>> check_id_duplicates([ ... ContentLicenseRow(id=b'foo', tool=tool1, license="AGPL"), ... ContentLicenseRow(id=b'foo', tool=tool1, license="AGPL"), ... ]) Traceback (most recent call last): ... swh.indexer.storage.exc.DuplicateId: [{'id': b'foo', 'license': 'AGPL', 'tool_configuration': '{}', 'tool_name': 'foo', 'tool_version': '1.2.3'}] 
- class swh.indexer.storage.IndexerStorage(db, min_pool_conns=1, max_pool_conns=10, journal_writer=None)[source]#
- Bases: - object- SWH Indexer Storage Datastore - Parameters:
- db – either a libpq connection string, or a psycopg connection 
- journal_writer – configuration passed to swh.journal.writer.get_journal_writer 
 
 - current_version = 137#
 - get_partition(indexer_type: str, indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000, with_textual_data=False) PagedResult[bytes, str][source]#
- Retrieve ids of content with indexer_type within within partition partition_id bound by limit. - Parameters:
- **indexer_type** – Type of data content to index (mimetype, etc…) 
- **indexer_configuration_id** – The tool used to index data 
- **partition_id** – index of the partition to fetch 
- **nb_partitions** – total number of partitions to split into 
- **page_token** – opaque token used for pagination 
- **limit** – Limit result (default to 1000) 
- **with_textual_data** (bool) – Deal with only textual content (True) or all content (all contents by defaults, False) 
 
- Raises:
- IndexerStorageArgumentException for; – 
- - limit to None – 
- - wrong indexer_type provided – 
 
- Returns:
- PagedResult of Sha1. If next_page_token is None, there is no more data to fetch 
 
 - content_mimetype_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str][source]#
 - content_fossology_license_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str][source]#
 - directory_intrinsic_metadata_add(metadata: List[DirectoryIntrinsicMetadataRow]) Dict[str, int][source]#
 - origin_intrinsic_metadata_search_fulltext(conjunction: List[str], limit: int = 100) List[OriginIntrinsicMetadataRow][source]#