swh.storage.backfill module#
Storage backfiller.
The backfiller goal is to produce back part or all of the objects from a storage to the journal topics
Current implementation consists in the JournalBackfiller class.
It simply reads the objects from the storage and sends every object identifier back to the journal.
- swh.storage.backfill.directory_converter(db: BaseDb, directory_d: Dict[str, Any]) Directory[source]#
- Convert directory from the flat representation to swh model compatible objects. 
- swh.storage.backfill.raw_extrinsic_metadata_converter(db: BaseDb, metadata: Dict[str, Any]) RawExtrinsicMetadata[source]#
- Convert a raw extrinsic metadata from the flat representation to swh model compatible objects. 
- swh.storage.backfill.extid_converter(db: BaseDb, extid: Dict[str, Any]) ExtID[source]#
- Convert an extid from the flat representation to swh model compatible objects. 
- swh.storage.backfill.revision_converter(db: BaseDb, revision_d: Dict[str, Any]) Revision[source]#
- Convert revision from the flat representation to swh model compatible objects. 
- swh.storage.backfill.release_converter(db: BaseDb, release_d: Dict[str, Any]) Release[source]#
- Convert release from the flat representation to swh model compatible objects. 
- swh.storage.backfill.snapshot_converter(db: BaseDb, snapshot_d: Dict[str, Any]) Snapshot[source]#
- Convert snapshot from the flat representation to swh model compatible objects. 
- swh.storage.backfill.object_to_offset(object_id, numbits)[source]#
- Compute the index of the range containing object id, when dividing
- space into 2^numbits. 
 
- swh.storage.backfill.byte_ranges(numbits: int, start_object: str | None = None, end_object: str | None = None) Iterator[Tuple[bytes | None, bytes | None]][source]#
- Generate start/end pairs of bytes spanning numbits bits and
- constrained by optional start_object and end_object. 
 - Parameters:
- numbits – Number of bits in which we divide input space 
- start_object – Hex object id contained in the first range returned 
- end_object – Hex object id contained in the last range returned 
 
- Yields:
- 2^numbits pairs of bytes 
 
- swh.storage.backfill.raw_extrinsic_metadata_target_ranges(start_object: str | None = None, end_object: str | None = None) Iterator[Tuple[str | None, str | None]][source]#
- Generate ranges of values for the target attribute of raw_extrinsic_metadata objects. - This generates one range for all values before the first SWHID (which would correspond to raw origin URLs), then a number of hex-based ranges for each known type of SWHID (2**12 ranges for directories, 2**8 ranges for all other types). Finally, it generates one extra range for values above all possible SWHIDs. 
- swh.storage.backfill.integer_ranges(start: str, end: str, block_size: int = 1000) Iterator[Tuple[int | None, int | None]][source]#
- swh.storage.backfill.fetch(db, obj_type, start, end)[source]#
- Fetch all obj_type’s identifiers from db. - This opens one connection, stream objects and when done, close the connection. 
- class swh.storage.backfill.JournalBackfiller(config=None)[source]#
- Bases: - object- Class in charge of reading the storage’s objects and sends those back to the journal’s topics. - This is designed to be run periodically. - property db#