swh.graph.luigi.subdataset module#
- class swh.graph.luigi.subdataset.SelectTopGithubOrigins(*args, **kwargs)[source]#
- Bases: - Task- Writes a list of origins selected from popular Github repositories - local_export_path = <luigi.parameter.PathParameter object>#
 - num_origins = <luigi.parameter.IntParameter object>#
 - query = <luigi.parameter.Parameter object>#
 
- class swh.graph.luigi.subdataset.ListSwhidsForSubdataset(*args, **kwargs)[source]#
- Bases: - Task- Lists all SWHIDs reachable from a set of origins - select_task = <luigi.parameter.ChoiceParameter object>#
 - local_export_path = <luigi.parameter.PathParameter object>#
 - grpc_api = <luigi.parameter.Parameter object>#
 
- class swh.graph.luigi.subdataset.CreateSubdatasetOnAthena(*args, **kwargs)[source]#
- Bases: - Task- Generates an ORC export from an existing ORC export, filtering out SWHIDs not in the given list. - s3_athena_output_location = <swh.export.luigi.S3PathParameter object>#
 - athena_db_name = <luigi.parameter.Parameter object>#
 - athena_parent_db_name = <luigi.parameter.Parameter object>#
 - object_types = <luigi.parameter.EnumListParameter object>#
 - requires() Dict[str, Task][source]#
- Returns an instance of - ListSwhidsForSubdatasetand one of- CreateAthena