Command-line interface#
swh storage#
Software Heritage Storage tools.
swh storage [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
- Configuration file. 
- --check-config <check_config>#
- Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands. - Options:
- no | read | write 
 
backfill#
Run the backfiller
The backfiller list objects from a Storage and produce journal entries from there.
Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).
The configuration file requires the following entries:
- brokers: a list of kafka endpoints (the journal) in which entries will be added. 
- storage_dbconn: URL to connect to the storage DB. 
- prefix: the prefix of the topics (topics will be <prefix>.<object_type>). 
- client_id: the kafka client ID. 
swh storage backfill [OPTIONS] OBJECT_TYPE
Options
- --start-object <start_object>#
- --end-object <end_object>#
- --dry-run#
Arguments
- OBJECT_TYPE#
- Required argument 
blocking#
Configure blocking of origins, preventing them from being archived
These tools require read/write access to the blocking database. An entry must be added to the configuration file as follow:
storage:
  …
blocking_admin:
  cls: postgresql
  db: "service=swh-blocking-admin"
swh storage blocking [OPTIONS] COMMAND [ARGS]...
clear-request#
Remove all blocking states for the given request
swh storage blocking clear-request [OPTIONS] SLUG
Options
- -m, --message <message>#
- an explanation for this change 
Arguments
- SLUG#
- Required argument 
history#
Get the history for a request
swh storage blocking history [OPTIONS] SLUG
Arguments
- SLUG#
- Required argument 
list-requests#
List blocking requests
swh storage blocking list-requests [OPTIONS]
Options
- -a, --include-cleared-requests, --exclude-cleared-requests#
- Show requests without any blocking state 
new-request#
Create a new request to block objects
SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.
A reason for the request must be specified, either using the -m option or via the provided editor.
swh storage blocking new-request [OPTIONS] SLUG
Options
- -m, --message <REASON>#
- why the request was made 
Arguments
- SLUG#
- Required argument 
origin-state#
Get the blocking state for a set of Origins
If an object given in the arguments is not listed in the output, it means no blocking state is set in any requests.
swh storage blocking origin-state [OPTIONS] ORIGIN
Arguments
- ORIGIN#
- Optional argument(s) 
status#
Get the blocking states defined by a request
swh storage blocking status [OPTIONS] SLUG
Arguments
- SLUG#
- Required argument 
update-objects#
Update the blocking state of given objects
The blocked state of the provided Origins will be updated to NEW_STATE for the request SLUG.
NEW_STATE must be one of “blocked”, “decision-pending” or “non_blocked”.
origins must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.
An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.
swh storage blocking update-objects [OPTIONS] SLUG NEW_STATE
Options
- -m, --message <message>#
- an explanation for this change 
- -f, --file <file>#
- a file with one Origin per line 
Arguments
- SLUG#
- Required argument 
- NEW_STATE#
- Required argument 
cassandra#
swh storage cassandra [OPTIONS] COMMAND [ARGS]...
init#
Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend
swh storage cassandra init [OPTIONS]
list-migrations#
Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend
swh storage cassandra list-migrations [OPTIONS]
mark-upgraded#
Marks a migration as run
Exit codes:
- 0: ok 
- 1: unexpected crash 
- 2: (unassigned) 
- 3: nothing to do 
swh storage cassandra mark-upgraded [OPTIONS]
Options
- --migration <migration_ids>#
upgrade#
Applies all pending migrations that can run automatically
Exit codes:
- 0: migrations applied 
- 1: unexpected crash 
- 2: (unassigned) 
- 3: no migrations to run 
- 4: some required migrations need to be manually applied 
- 5: some optional migrations need to be manually applied 
- 6: some required (and optional) migrations could not be applied because a dependency is missing (only if –migration was passed) 
- 7: some optional migrations could not be applied because a dependency is missing (only if –migration was passed) 
swh storage cassandra upgrade [OPTIONS]
Options
- --migration <migration_ids>#
create-object-reference-partitions#
Create object_reference partitions from START_DATE to END_DATE
swh storage create-object-reference-partitions [OPTIONS] START END
Arguments
- START#
- Required argument 
- END#
- Required argument 
masking#
Configure masking on archived objects
These tools require read/write access to the masking database. An entry must be added to the configuration file as follow:
storage:
  …
masking_admin:
  cls: postgresql
  db: "service=swh-masking-admin"
swh storage masking [OPTIONS] COMMAND [ARGS]...
clear-request#
Remove all masking states for the given request
swh storage masking clear-request [OPTIONS] SLUG
Options
- -m, --message <message>#
- an explanation for this change 
Arguments
- SLUG#
- Required argument 
history#
Get the history for a request
swh storage masking history [OPTIONS] SLUG
Arguments
- SLUG#
- Required argument 
list-requests#
List masking requests
swh storage masking list-requests [OPTIONS]
Options
- -a, --include-cleared-requests, --exclude-cleared-requests#
- Show requests without any masking state 
new-request#
Create a new request to mask objects
SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.
A reason for the request must be specified, either using the -m option or via the provided editor.
swh storage masking new-request [OPTIONS] SLUG
Options
- -m, --message <REASON>#
- why the request was made 
Arguments
- SLUG#
- Required argument 
object-state#
Get the masking state for a set of SWHIDs
If an object given in the arguments is not listed in the output, it means no masking state is set in any requests.
swh storage masking object-state [OPTIONS] SWHID
Arguments
- SWHID#
- Optional argument(s) 
patching#
Tools to manage the patching of objects
swh storage masking patching [OPTIONS] COMMAND [ARGS]...
set#
Set display names (patching entries)
swh storage masking patching set [OPTIONS] INPUT
Options
- --clear, --keep#
- Clear the display names table before inserting new entries 
Arguments
- INPUT#
- Required argument 
status#
Get the masking states defined by a request
swh storage masking status [OPTIONS] SLUG
Arguments
- SLUG#
- Required argument 
update-objects#
Update the state of given objects
The masked state of the provided SWHIDs will be updated to NEW_STATE for the request SLUG.
NEW_STATE must be one of “visible”, “decision-pending” or “restricted”.
SWHIDs must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.
An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.
swh storage masking update-objects [OPTIONS] SLUG NEW_STATE
Options
- -m, --message <message>#
- an explanation for this change 
- -f, --file <file>#
- a file with on SWHID per line 
Arguments
- SLUG#
- Required argument 
- NEW_STATE#
- Required argument 
remove-old-object-reference-partitions#
Remove object_reference partitions for values older than BEFORE
swh storage remove-old-object-reference-partitions [OPTIONS] BEFORE
Options
- --force#
- do not ask for confirmation before removing tables 
Arguments
- BEFORE#
- Required argument 
replay#
Fill a Storage by reading a Journal.
This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.
The expected configuration file should have 2 sections:
- storage: the configuration of the storage in which to add objects received from the kafka journal, 
- journal_client: the configuration of access to the kafka journal. See the documentation of swh.journal for more details on the possible configuration entries in this section. - https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html 
In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:
storage:
  [...]
journal_client:
  [...]
replayer:
  error_reporter:
    host: redis.local
    port: 6379
    db: 1
swh storage replay [OPTIONS]
Options
- -n, --stop-after-objects <stop_after_objects>#
- Stop after processing this many objects. Default is to run forever. 
- -t, --type <object_types>#
- Object types to replay - Options:
- origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid 
 
- -X, --known-mismatched-hashes <invalid_hashes_file>#
- File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed. 
rpc-serve#
Software Heritage Storage RPC server.
Do NOT use this in a production environment.
swh storage rpc-serve [OPTIONS]
Options
- --host <IP>#
- Host ip address to bind the server on - Default:
- '0.0.0.0'
 
- --port <PORT>#
- Binding port of the server - Default:
- 5002
 
- --debug, --no-debug#
- Indicates if the server should run in debug mode