Command-line interface#
swh scrubber#
main command group of the datastore scrubber
Expected config format:
scrubber:
    cls: postgresql
    db: "service=..."    # libpq DSN
# for storage checkers + origin locator only:
storage:
    cls: postgresql     # cannot be remote for checkers, as they need direct
                        # access to the pg DB
    db": "service=..."  # libpq DSN
    objstorage:
        cls: memory
# for journal checkers only:
journal:
    # see https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
    # for the full list of options
    sasl.mechanism: SCRAM-SHA-512
    security.protocol: SASL_SSL
    sasl.username: ...
    sasl.password: ...
    group_id: ...
    privileged: True
    message.max.bytes: 524288000
    brokers:
      - "broker1.journal.softwareheritage.org:9093
      - "broker2.journal.softwareheritage.org:9093
      - "broker3.journal.softwareheritage.org:9093
      - "broker4.journal.softwareheritage.org:9093
      - "broker5.journal.softwareheritage.org:9093
    object_types: [directory, revision, snapshot, release]
    auto_offset_reset: earliest
swh scrubber [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
- Configuration file. 
check#
group of commands which read from data stores and report errors.
swh scrubber check [OPTIONS] COMMAND [ARGS]...
init#
Initialise a scrubber check configuration for the datastore defined in the configuration file and given object_type.
A checker configuration configuration consists simply in a set of:
- backend: the datastore type being scrubbed (storage, objstorage or journal), 
- object-type: the type of object being checked, 
- nb-partitions: the number of partitions the hash space is divided in; must be a power of 2, 
- name: an unique name for easier reference, 
- check-hashes: flag (default to True) to select the hash validation step for this scrubbing configuration, 
- check-references: flag (default to True for storage and False for the journal backend) to select the reference validation step for this scrubbing configuration. 
swh scrubber check init [OPTIONS] {storage|journal|objstorage}
Options
- --object-type <object_type>#
- Options:
- snapshot | revision | release | directory | content 
 
- --nb-partitions <nb_partitions>#
- --name <name>#
- --check-hashes, --no-check-hashes#
- --check-references, --no-check-references#
Arguments
- BACKEND#
- Required argument 
list#
List the know configurations
swh scrubber check list [OPTIONS]
run#
Run the scrubber checker configured as name and reports corrupt objects to the scrubber DB.
This runs a single thread; parallelism is achieved by running this command multiple times.
This command references an existing scrubbing configuration (either by name or by id); the configuration holds the object type, number of partitions and the storage configuration this scrubbing session will check on.
swh scrubber check run [OPTIONS] [NAME]
Options
- --config-id <config_id>#
- Config ID (is config name is not given as argument) 
- --use-journal#
- Flag only relevant for running an object storage scrubber, if set content ids are consumed from a kafka topic of SWH journal instead of getting them from a storage 
- --limit <limit>#
Arguments
- NAME#
- Optional argument 
running#
List partitions being checked for the check session <name>
swh scrubber check running [OPTIONS] [NAME]
Options
- --config-id <config_id>#
Arguments
- NAME#
- Optional argument 
stalled#
List the stuck partitions for a given config
swh scrubber check stalled [OPTIONS] [NAME]
Options
- --config-id <config_id>#
- --for <delay>#
- Delay for a partition to be considered as stuck; in seconds or ‘auto’ 
- --reset#
- Reset the stalled partition so it can be grabbed by a scrubber worker 
Arguments
- NAME#
- Optional argument 
stats#
Display statistics for the check session <name>
swh scrubber check stats [OPTIONS] [NAME]
Options
- --config-id <config_id>#
- -j, --json#
Arguments
- NAME#
- Optional argument 
fix#
For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.
swh scrubber fix [OPTIONS]
Options
- --start-object <start_object>#
- --end-object <end_object>#
locate#
For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.
swh scrubber locate [OPTIONS]
Options
- --start-object <start_object>#
- --end-object <end_object>#