swh.storage.algos.diff module#
- swh.storage.algos.diff.diff_directories(storage: StorageInterface, from_dir: bytes | None, to_dir: bytes, track_renaming: bool = False) List[Dict[str, Any]][source]#
Compute the differential between two directories, i.e. the list of file changes (insertion / deletion / modification / renaming) between them.
- Parameters:
storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)
from_dir – the swh identifier of the directory to compare from
to_dir – the swh identifier of the directory to compare to
track_renaming – whether or not to track files renaming
- Returns:
A list of dict representing the changes between the two revisions. Each dict contains the following entries:
type: a string describing the type of change (insert/delete/modify/rename)from: a dict containing the directory entry metadata in the from revision (Nonein case of an insertion)from_path: bytes string corresponding to the absolute path of the from revision entry (Nonein case of an insertion)to: a dict containing the directory entry metadata in the to revision (Nonein case of a deletion)to_path: bytes string corresponding to the absolute path of the to revision entry (Nonein case of a deletion)
The returned list is sorted in lexicographic depth-first order according to the value of the
to_pathfield.Warning
The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.
- swh.storage.algos.diff.diff_revisions(storage: StorageInterface, from_rev: bytes | None, to_rev: bytes, track_renaming: bool = False) List[Dict[str, Any]][source]#
Compute the differential between two revisions, i.e. the list of file changes between the two associated directories.
- Parameters:
storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)
from_rev – the identifier of the revision to compare from
to_rev – the identifier of the revision to compare to
track_renaming – whether or not to track files renaming
- Returns:
A list of dict describing the introduced file changes (see
swh.storage.algos.diff.diff_directories()).
Warning
The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.
- swh.storage.algos.diff.diff_revision(storage: StorageInterface, revision: bytes, track_renaming: bool = False) List[Dict[str, Any]][source]#
Computes the differential between a revision and its first parent. If the revision has no parents, the directory to compare from is considered as empty. In other words, it computes the file changes introduced in a specific revision.
- Parameters:
storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)
revision – the identifier of the revision from which to compute the introduced changes.
track_renaming – whether or not to track files renaming
- Returns:
A list of dict describing the introduced file changes (see
swh.storage.algos.diff.diff_directories()).
Warning
The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.