swh.graph.luigi package#
Submodules#
- swh.graph.luigi.compressed_graph module
- Luigi tasks for compression
ObjectTypesParameterExtractNodesExtractLabelsNodeStatsEdgeStatsLabelStatsMphBvBvEfBfsRootsBfsPermuteAndSimplifyBfsBfsEfBfsDcfLlpPermuteLlpOffsetsEfComposeOrdersTransposeTransposeOffsetsTransposeEfMapsExtractPersonsPersonsStatsMphPersonsNodePropertiesPthashLabelsLabelsOrderFclLabelsEdgeLabelsEdgeLabelsTransposeEdgeLabelsEfEdgeLabelsTransposeEfStatsCompressGraphUploadGraphToS3DownloadGraphFromS3LocalGraph
- swh.graph.luigi.subdataset module
SelectTopGithubOriginsListSwhidsForSubdatasetCreateSubdatasetOnAthenaCreateSubdatasetOnAthena.local_export_pathCreateSubdatasetOnAthena.s3_parent_export_pathCreateSubdatasetOnAthena.s3_export_pathCreateSubdatasetOnAthena.s3_athena_output_locationCreateSubdatasetOnAthena.athena_db_nameCreateSubdatasetOnAthena.athena_parent_db_nameCreateSubdatasetOnAthena.object_typesCreateSubdatasetOnAthena.requires()CreateSubdatasetOnAthena.output()CreateSubdatasetOnAthena.run()
- swh.graph.luigi.topology module
- Luigi tasks to analyze, and produce datasets related to, graph topology
TopoSortComputeGenerationsUploadGenerationsToS3UploadGenerationsToS3.local_graph_pathUploadGenerationsToS3.topological_order_dirUploadGenerationsToS3.dataset_nameUploadGenerationsToS3.graph_nameUploadGenerationsToS3.object_typesUploadGenerationsToS3.directionUploadGenerationsToS3.requires()UploadGenerationsToS3.output()UploadGenerationsToS3.run()
CountPathsPathCountsParquetToS3
- swh.graph.luigi.utils module
Module contents#
Luigi tasks#
This package contains Luigi tasks. These come in two kinds:
in
swh.graph.luigi.compressed_graph: an alternative to the ‘swh graph compress’ CLI that can be composed with other tasks, such as swh-export’sin other submodules: tasks driving the creation of specific datasets that are generated using the compressed graph
The overall directory structure is:
base_dir/
<date>[_<flavor>]/
edges/
...
orc/
...
compressed/
graph.graph
graph.mph
...
meta/
export.json
compression.json
datasets/
contribution_graph.csv.zst
topology/
topological_order_dfs.csv.zst
And optionally:
sensitive_base_dir/
<date>[_<flavor>]/
persons_sha256_to_name.csv.zst
datasets/
contribution_graph.deanonymized.csv.zst