swh.storage.proxies.blocking.db module#
- class swh.storage.proxies.blocking.db.BlockingState(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- Bases: - Enum- Value recording “how much” an url associated to a blocking request is blocked - NON_BLOCKED = 1#
- The origin url can be ingested/updated 
 - DECISION_PENDING = 2#
- Ingestion from origin url is temporarily blocked until the request is reviewed 
 - BLOCKED = 3#
- Ingestion from origin url is permanently blocked 
 
- class swh.storage.proxies.blocking.db.BlockingStatus(state: BlockingState, request: UUID)[source]#
- Bases: - object- Return value when requesting if an origin url ingestion is blocked - Method generated by attrs for class BlockingStatus. 
- class swh.storage.proxies.blocking.db.BlockingRequest(id: UUID, slug: str, date: datetime, reason: str)[source]#
- Bases: - object- A request for blocking a set of origins from being ingested - Method generated by attrs for class BlockingRequest. - id#
- Unique id for the request (will be returned to requesting clients) 
 - slug#
- Unique, human-readable id for the request (for administrative interactions) 
 - date#
- Date the request was received 
 - reason#
- Why the request was made 
 
- class swh.storage.proxies.blocking.db.RequestHistory(request: UUID, date: datetime, message: str)[source]#
- Bases: - object- Method generated by attrs for class RequestHistory. - request#
- id of the blocking request 
 - date#
- Date the history entry has been added 
 - message#
- Free-form history information (e.g. “policy decision made”) 
 
- class swh.storage.proxies.blocking.db.BlockingLogEntry(url: str, url_match: str, request: UUID, date: datetime, state: BlockingState)[source]#
- Bases: - object- Method generated by attrs for class BlockingLogEntry. - url#
- origin url that have been blocked 
 - url_match#
- url matching pattern that caused the blocking of the origin url 
 - request#
- id of the blocking request 
 - date#
- Date the blocking event occurred 
 - state#
- Blocking state responsible for the blocking event 
 
- class swh.storage.proxies.blocking.db.BlockedOrigin(request_slug: str, url_pattern: str, state: BlockingState)[source]#
- Bases: - object- Method generated by attrs for class BlockedOrigin. 
- class swh.storage.proxies.blocking.db.BlockingDb(*args, **kwargs)[source]#
- Bases: - BaseDb- create a DB proxy - Parameters:
- conn – psycopg connection to the SWH DB 
- pool – psycopg pool of connections 
 
 - current_version = 1#
 
- swh.storage.proxies.blocking.db.get_urls_to_check(url: str) Tuple[List[str], List[str]][source]#
- Get the entries to check in the database for the given url, in order. - Exact matching is done on the following strings, in order:
- the url with any trailing slashes removed (the so-called “trimmed url”); 
- the url passed exactly; 
- if the trimmed url ends with a dot and one of the - KNOWN_SUFFIXES, the url with this suffix stripped.
 
 - The prefix matching is done by splitting the path part of the URL on slashes, and successively removing the last elements. - Returns:
- A tuple with a list of exact matches, and a list of prefix matches 
 
- class swh.storage.proxies.blocking.db.BlockingAdmin(*args, **kwargs)[source]#
- Bases: - BlockingDb- create a DB proxy - Parameters:
- conn – psycopg connection to the SWH DB 
- pool – psycopg pool of connections 
 
 - create_request(slug: str, reason: str) BlockingRequest[source]#
- Record a new blocking request - Parameters:
- slug – human-readable unique identifier for the request 
- reason – free-form text recording why the request was made 
 
- Raises:
- DuplicateRequest when the slug already exists – 
 
 - find_request(slug: str) BlockingRequest | None[source]#
- Find a blocking request using its slug - Returns: - Noneif a request with the given slug doesn’t exist
 - find_request_by_id(id: UUID) BlockingRequest | None[source]#
- Find a blocking request using its id - Returns: - Noneif a request with the given request doesn’t exist
 - get_requests(include_cleared_requests: bool = False) List[Tuple[BlockingRequest, int]][source]#
- Get known requests - Parameters:
- include_cleared_requests – also include requests with no associated 
- states (blocking) 
 
 
 - set_origins_state(request_id: UUID, new_state: BlockingState, urls: List[str])[source]#
- Within the request with the given id, record the state of the given objects as - new_state.- This creates entries or updates them as appropriate. - Raises: - RequestNotFoundif the request is not found.
 - get_states_for_request(request_id: UUID) Dict[str, BlockingState][source]#
- Get the state of urls associated with the given request. - Raises - RequestNotFoundif the request is not found.
 - find_blocking_states(urls: List[str]) List[BlockedOrigin][source]#
- Lookup the blocking state and associated requests for the given urls (exact match). 
 - delete_blocking_states(request_id: UUID) None[source]#
- Remove all blocking states for the given request. - Raises: - RequestNotFoundif the request is not found.
 - record_history(request_id: UUID, message: str) RequestHistory[source]#
- Add an entry to the history of the given request. - Raises: - RequestNotFoundif the request is not found.
 - get_history(request_id: UUID) List[RequestHistory][source]#
- Get the history of a given request. - Raises: - RequestNotFoundif the request if not found.
 
- class swh.storage.proxies.blocking.db.BlockingQuery(*args, **kwargs)[source]#
- Bases: - BlockingDb- create a DB proxy - Parameters:
- conn – psycopg connection to the SWH DB 
- pool – psycopg pool of connections 
 
 - origins_are_blocked(urls: List[str], all_statuses=False) Dict[str, BlockingStatus][source]#
- Return the blocking status for eeach origin url given in urls - If all_statuses is False, do not return urls whose blocking status is defined as NON_BLOCKING (so only return actually blocked urls). Otherwise, return all matching blocking status. 
 - origin_is_blocked(url: str) BlockingStatus | None[source]#
- Checks if the origin URL should be blocked. - If the given url matches a set of registered blocking rules, return the most appropriate one. Otherwise, return None. - Log the blocking event in the database (log only a matching events).