swh.lister.rubygems package#
Submodules#
- swh.lister.rubygems.lister module- RubyGemsLister- RubyGemsLister.LISTER_NAME
- RubyGemsLister.VISIT_TYPE
- RubyGemsLister.INSTANCE
- RubyGemsLister.RUBY_GEMS_POSTGRES_DUMP_BASE_URL
- RubyGemsLister.RUBY_GEMS_POSTGRES_DUMP_LIST_URL
- RubyGemsLister.RUBY_GEM_DOWNLOAD_URL_PATTERN
- RubyGemsLister.RUBY_GEM_ORIGIN_URL_PATTERN
- RubyGemsLister.RUBY_GEM_EXTRINSIC_METADATA_URL_PATTERN
- RubyGemsLister.DB_NAME
- RubyGemsLister.DUMP_SQL_PATH
- RubyGemsLister.get_latest_dump_file()
- RubyGemsLister.create_rubygems_db()
- RubyGemsLister.populate_rubygems_db()
- RubyGemsLister.get_pages()
- RubyGemsLister.get_origins_from_page()
 
 
- swh.lister.rubygems.tasks module
Module contents#
RubyGems lister#
The RubyGems lister list origins from RubyGems.org, the Ruby community’s gem hosting service.
As of September 2022 RubyGems.org list 173384 package names.
Origins retrieving strategy#
To get a list of all package names we call an http endpoint which returns a list of gems as text.
Page listing#
Each page returns an origin url based on the following pattern:
https://rubygems.org/gems/{pkgname}
Origins from page#
The lister yields one origin url per page.
Running tests#
Activate the virtualenv and run from within swh-lister directory:
pytest -s -vv --log-cli-level=DEBUG swh/lister/rubygems/tests
Testing with Docker#
Change directory to swh/docker then launch the docker environment:
docker compose up -d
Then schedule a RubyGems listing task:
docker compose exec swh-scheduler swh scheduler task add -p oneshot list-rubygems
You can follow lister execution by displaying logs of swh-lister service:
docker compose logs -f swh-lister