swh.model.hashutil module#
Module in charge of hashing function definitions. This is the base module use to compute swh’s hashes.
Only a subset of hashing algorithms is supported as defined in the ALGORITHMS set. Any provided algorithms not in that list will result in a ValueError explaining the error.
This module defines a MultiHash class to ease the softwareheritage hashing algorithms computation. This allows to compute hashes from file object, path, data using a similar interface as what the standard hashlib module provides.
Basic usage examples:
- file object: MultiHash.from_file(
- file_object, hash_names=DEFAULT_ALGORITHMS).digest() 
 
- path (filepath): MultiHash.from_path(b’foo’).hexdigest() 
- data (bytes): MultiHash.from_data(b’foo’).bytehexdigest() 
“Complex” usage, defining a swh hashlib instance first:
- To compute length, integrate the length to the set of algorithms to compute, for example: - h = MultiHash(hash_names=set({'length'}).union(DEFAULT_ALGORITHMS)) with open(filepath, 'rb') as f: h.update(f.read(HASH_BLOCK_SIZE)) hashes = h.digest() # returns a dict of {hash_algo_name: hash_in_bytes} 
- Write alongside computing hashing algorithms (from a stream), example: - h = MultiHash(length=length) with open(filepath, 'wb') as f: for chunk in r.iter_content(): # r a stream of sort h.update(chunk) f.write(chunk) hashes = h.hexdigest() # returns a dict of {hash_algo_name: hash_in_hex} 
- swh.model.hashutil.ALGORITHMS = {'blake2b512', 'blake2s256', 'md5', 'sha1', 'sha1_git', 'sha256', 'sha512'}#
- Hashing algorithms supported by this module 
- swh.model.hashutil.DEFAULT_ALGORITHMS = {'blake2s256', 'sha1', 'sha1_git', 'sha256'}#
- Algorithms computed by default when calling the functions from this module. - Subset of - ALGORITHMS.
- swh.model.hashutil.HASH_BLOCK_SIZE = 32768#
- Block size for streaming hash computations made in this module 
- class swh.model.hashutil.MultiHash(hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]#
- Bases: - object- Hashutil class to support multiple hashes computation. - Parameters:
 - If the length is provided as algorithm, the length is also computed and returned. 
- swh.model.hashutil.git_object_header(git_type: str, length: int) bytes[source]#
- Returns the header for a git object of the given type and length. - The header of a git object consists of:
- The type of the object (encoded in ASCII) 
- One ASCII space ( ) 
- The length of the object (decimal encoded in ASCII) 
- One NUL byte 
 
 - Parameters:
- base_algo (str from - ALGORITHMS) – a hashlib-supported algorithm
- git_type – the type of the git object (supposedly one of ‘blob’, ‘commit’, ‘tag’, ‘tree’) 
- length – the length of the git object you’re encoding 
 
- Returns:
- a hashutil.hash object 
 
- swh.model.hashutil.hash_git_data(data, git_type, base_algo='sha1')[source]#
- Hash the given data as a git object of type git_type. - Parameters:
- data – a bytes object 
- git_type – the git object type 
- base_algo – the base hashing algorithm used (default: sha1) 
 
 - Returns: a dict mapping each algorithm to a bytes digest - Raises:
- ValueError if the git_type is unexpected. – 
 
- swh.model.hashutil.hash_to_hex(hash: str | bytes) str[source]#
- Converts a hash (in hex or bytes form) to its hexadecimal ascii form 
- swh.model.hashutil.hash_to_bytehex(hash: bytes) bytes[source]#
- Converts a hash to its hexadecimal bytes representation