swh.scrubber.fixer module#
Reads all known corrupts objects from the swh-scrubber database, and tries to recover them.
Currently, only recovery from Git origins is implemented
- swh.scrubber.fixer.get_object_from_clone(clone_path: Path, swhid: CoreSWHID) None | bytes | ShaFile[source]#
- Reads the original object matching the - corrupt_objectfrom the given clone if it exists, and returns a Dulwich object if possible, or a the raw manifest.
- swh.scrubber.fixer.get_fixed_object_from_clone(clone_path: Path, corrupt_object: CorruptObject) FixedObject | None[source]#
- Reads the original object matching the - corrupt_objectfrom the given clone if it exists, and returns a- FixedObjectinstance ready to be inserted in the database.
- class swh.scrubber.fixer.Fixer(db: ScrubberDb, start_object: CoreSWHID = CoreSWHID.from_string('swh:1:cnt:0000000000000000000000000000000000000000'), end_object: CoreSWHID = CoreSWHID.from_string('swh:1:snp:ffffffffffffffffffffffffffffffffffffffff'))[source]#
- Bases: - object- Reads a chunk of corrupt objects in the swh-scrubber database, tries to recover them through various means (brute-forcing fields and re-downloading from the origin) recomputes checksums, and writes them back to the swh-scrubber database if successful. - db: ScrubberDb#
- Database to read from and write to. 
 - start_object: CoreSWHID = CoreSWHID.from_string('swh:1:cnt:0000000000000000000000000000000000000000')#
- Minimum SWHID to check (in alphabetical order) 
 - end_object: CoreSWHID = CoreSWHID.from_string('swh:1:snp:ffffffffffffffffffffffffffffffffffffffff')#
- Maximum SWHID to check (in alphabetical order) 
 - recover_objects_from_origin(origin_url)[source]#
- Clones an origin, and cherry-picks original objects that are known to be corrupt in the database. 
 - recover_corrupt_object(corrupt_object: CorruptObject, cur: Cursor, clone_path: Path) None[source]#