swh.lister.maven.lister module#
- class swh.lister.maven.lister.MavenListerState(last_seen_doc: int = -1, last_seen_pom: int = -1)[source]#
- Bases: - object- State of the MavenLister 
- class swh.lister.maven.lister.MavenLister(scheduler: SchedulerInterface, url: str, index_url: str, instance: str | None = None, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, incremental: bool = True)[source]#
- Bases: - Lister[- MavenListerState,- Dict[- str,- Any]]- List origins from a Maven repository. - Maven Central provides artifacts for Java builds. It includes POM files and source archives, which we download to get the source code of artifacts and links to their scm repository. - This lister yields origins of types: git/svn/hg or whatever the Artifacts use as repository type, plus maven types for the maven loader (tgz, jar). - Lister class for Maven repositories. - Parameters:
- url – main URL of the Maven repository, i.e. url of the base index used to fetch maven artifacts. For Maven central use https://repo1.maven.org/maven2/ 
- index_url – the URL to download the exported text indexes from. Would typically be a local host running the export docker image. See README.md in this directory for more information. 
- instance – Name of maven instance. Defaults to url’s network location if unset. 
- incremental – bool, defaults to True. Defines if incremental listing is activated or not. 
 
 - state_from_dict(d: Dict[str, Any]) MavenListerState[source]#
- Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister. 
 - state_to_dict(state: MavenListerState) Dict[str, Any][source]#
- Convert the StateType for this lister to its serialization as dict for storage in the scheduler. - Values must be JSON-compatible as that’s what the backend database expects. 
 - get_pages() Iterator[Dict[str, Any]][source]#
- Retrieve and parse exported maven indexes to identify all pom files and src archives. 
 - get_scm(page: Dict[str, Any]) ListedOrigin | None[source]#
- Retrieve scm origin out of the page information. Only called when type of the page is scm. - Try and detect an scm/vcs repository. Note that official format is in the form: scm:{type}:git://example.org/{user}/{repo}.git but some projects directly put the repo url (without the “scm:type”), so we have to check against the content to extract the type and url properly. - Raises
- AssertionError when the type of the page is not ‘scm’ 
- Returns
- ListedOrigin with proper canonical scm url (for github) if any is found, None otherwise. 
 
 - get_origins_from_page(page: Dict[str, Any]) Iterator[ListedOrigin][source]#
- Convert a page of Maven repositories into a list of ListedOrigins.