swh.graph.webgraph module#
WebGraph driver
- exception swh.graph.webgraph.CompressionSubprocessError(message: str, log_path: Path)[source]#
- Bases: - Exception
- class swh.graph.webgraph.CompressionStep(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
- Bases: - Enum- EXTRACT_NODES = -20#
 - EXTRACT_LABELS = -10#
 - NODE_STATS = 0#
 - EDGE_STATS = 3#
 - LABEL_STATS = 6#
 - MPH = 10#
 - BV = 30#
 - BV_EF = 40#
 - BFS_ROOTS = 50#
 - BFS = 60#
 - PERMUTE_AND_SIMPLIFY_BFS = 70#
 - BFS_EF = 80#
 - BFS_DCF = 90#
 - LLP = 100#
 - COMPOSE_ORDERS = 110#
 - PERMUTE_LLP = 120#
 - OFFSETS = 130#
 - EF = 140#
 - TRANSPOSE = 160#
 - TRANSPOSE_OFFSETS = 170#
 - TRANSPOSE_EF = 175#
 - MAPS = 180#
 - EXTRACT_PERSONS = 190#
 - PERSONS_STATS = 195#
 - MPH_PERSONS = 200#
 - NODE_PROPERTIES = 210#
 - MPH_LABELS = 220#
 - LABELS_ORDER = 225#
 - FCL_LABELS = 230#
 - EDGE_LABELS = 240#
 - EDGE_LABELS_TRANSPOSE = 250#
 - EDGE_LABELS_EF = 270#
 - EDGE_LABELS_TRANSPOSE_EF = 280#
 - STATS = 290#
 - E2E_TEST = 295#
 - CLEAN_TMP = 300#
 
- swh.graph.webgraph.compress(graph_name: str, in_dir: str, out_dir: str, test_flavor: str | None, steps: ~typing.Set[~swh.graph.webgraph.CompressionStep] = {CompressionStep.BFS, CompressionStep.BFS_DCF, CompressionStep.BFS_EF, CompressionStep.BFS_ROOTS, CompressionStep.BV, CompressionStep.BV_EF, CompressionStep.CLEAN_TMP, CompressionStep.COMPOSE_ORDERS, CompressionStep.E2E_TEST, CompressionStep.EDGE_LABELS, CompressionStep.EDGE_LABELS_EF, CompressionStep.EDGE_LABELS_TRANSPOSE, CompressionStep.EDGE_LABELS_TRANSPOSE_EF, CompressionStep.EDGE_STATS, CompressionStep.EF, CompressionStep.EXTRACT_LABELS, CompressionStep.EXTRACT_NODES, CompressionStep.EXTRACT_PERSONS, CompressionStep.FCL_LABELS, CompressionStep.LABELS_ORDER, CompressionStep.LABEL_STATS, CompressionStep.LLP, CompressionStep.MAPS, CompressionStep.MPH, CompressionStep.MPH_LABELS, CompressionStep.MPH_PERSONS, CompressionStep.NODE_PROPERTIES, CompressionStep.NODE_STATS, CompressionStep.OFFSETS, CompressionStep.PERMUTE_AND_SIMPLIFY_BFS, CompressionStep.PERMUTE_LLP, CompressionStep.PERSONS_STATS, CompressionStep.STATS, CompressionStep.TRANSPOSE, CompressionStep.TRANSPOSE_EF, CompressionStep.TRANSPOSE_OFFSETS}, conf: ~typing.Dict[str, str] = {}, progress_cb: ~typing.Callable[[int, ~swh.graph.webgraph.CompressionStep], None] = <function <lambda>>)[source]#
- graph compression pipeline driver from nodes/edges files to compressed on-disk representation - Parameters:
- graph_name – graph base name, relative to in_dir 
- in_dir – input directory, where the uncompressed graph can be found 
- out_dir – output directory, where the compressed graph will be stored 
- test_flavor – which flavor of tests to run 
- steps – compression steps to run (default: all steps) 
- conf – - compression configuration, supporting the following keys (all are optional, so an empty configuration is fine and is the default) - batch_size: batch size for WebGraph transformations; defaults to 1 billion 
- tmp_dir: temporary directory, defaults to the “tmp” subdir of out_dir 
- object_types: comma-separated list of object types to extract (eg. - ori,snp,rel,rev). Defaults to- *.
 
- progress_cb – a callable taking a percentage and step as argument, which is called every time a step starts.