swh.scheduler.cli.origin_utils module#
Defines the <swh scheduler origin send-origins-from-file-to-celery> cli utility functions. This uses a list of origins read from the standard input or file, massage them into scheduler tasks to send directly to celery to a queue (according to a task type specified).
The list of origins has been extracted by other means (e.g. sentry extract, combination of various shell scripts, …). Then, a human operator provides the list to the cli so it’s consumed by standard swh queues (understand scheduler configured backend).
- swh.scheduler.cli.origin_utils.get_scheduler_task_type(scheduler: SchedulerInterface, task_type_name: str) TaskType[source]#
- Retrieve a TaskType instance for a task type name from the scheduler. - Parameters:
- scheduler – Scheduler instance to lookup data from 
- task_type_name – The task type name to lookup 
 
- Raises:
- ValueError when task_type_name or its fallback are not found. – 
- Returns:
- Information about the task type 
 
- swh.scheduler.cli.origin_utils.lines_to_task_args(lines: Iterable[str], columns: List[str] = ['url'], postprocess: Callable[[Dict[str, Any]], Dict[str, Any]] | None = None, **kwargs) Iterator[Dict[str, Any]][source]#
- Iterate over the lines and convert them into celery tasks ready to be sent. - Parameters:
- lines – Line read from a file or stdin 
- columns – structure of the lines to be read (usually only the url column) 
- postprocess – An optional callable to enrich the task with 
- **kwargs – extra static arguments to enrich the task with 
 
- Yields:
- task ready to be sent to celery