swh.indexer.storage.db module#
- swh.indexer.storage.db.execute_values_generator(cur: Cursor, query: str, values: Iterable[Any]) Iterator[Any][source]#
- class swh.indexer.storage.db.Db(conn: Connection[Any], pool: ConnectionPool | None = None)[source]#
- Bases: - BaseDb- Proxy to the SWH Indexer DB, with wrappers around stored procedures - create a DB proxy - Parameters:
- conn – psycopg connection to the SWH DB 
- pool – psycopg pool of connections 
 
 - content_mimetype_hash_keys = ['id', 'indexer_configuration_id']#
 - content_mimetype_missing_from_list(mimetypes: Iterable[Dict], cur=None) Iterator[bytes][source]#
- List missing mimetypes. 
 - content_mimetype_cols = ['id', 'mimetype', 'encoding', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
 - content_indexer_names = {'fossology_license': 'content_fossology_license', 'mimetype': 'content_mimetype'}#
 - content_get_range(content_type, start, end, indexer_configuration_id, limit=1000, with_textual_data=False, cur=None)[source]#
- Retrieve contents with content_type, within range [start, end] bound by limit and associated to the given indexer configuration id. - When asking to work on textual content, that filters on the mimetype table with any mimetype that is not binary. 
 - content_fossology_license_cols = ['id', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration', 'license']#
 - content_metadata_hash_keys = ['id', 'indexer_configuration_id']#
 - content_metadata_cols = ['id', 'metadata', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
 - directory_intrinsic_metadata_hash_keys = ['id', 'indexer_configuration_id']#
 - directory_intrinsic_metadata_cols = ['id', 'metadata', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
 - origin_intrinsic_metadata_cols = ['id', 'metadata', 'from_directory', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
 - origin_intrinsic_metadata_regconfig = 'pg_catalog.simple'#
- The dictionary used to normalize ‘metadata’ and queries. ‘pg_catalog.simple’ provides no stopword, so it should be suitable for proper names and non-English content. When updating this value, make sure to add a new index on origin_intrinsic_metadata.metadata. 
 - origin_intrinsic_metadata_search_by_producer(last, limit, ids_only, mappings, tool_ids, cur)[source]#
 - origin_extrinsic_metadata_cols = ['id', 'metadata', 'from_remd_id', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
 - indexer_configuration_cols = ['id', 'tool_name', 'tool_version', 'tool_configuration']#