Request archival#
- POST /api/1/add-forge/request/create/#
- Create a new request to add a forge to the list of those crawled regularly by Software Heritage. - Warning - That endpoint is not publicly available and requires authentication in order to be able to request it. - Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
 
- Request JSON Object:
- forge_type (string) – the type of forge 
- forge_url (string) – the base URL of the forge 
- forge_contact_email (string) – email of an administrator of the forge to contact 
- forge_contact_name (string) – the name of the administrator 
- forge_contact_comment (string) – to explain how Software Heritage can verify forge administrator info are valid 
 
- Status Codes:
- 201 Created – request successfully created 
- 400 Bad Request – missing or invalid field values 
- 403 Forbidden – user not authenticated 
 
 
- POST /api/1/add-forge/request/(id)/update/#
- Update a request to add a forge to the list of those crawled regularly by Software Heritage. - Warning - That endpoint is not publicly available and requires authentication in order to be able to request it. - Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
 
- Request JSON Object:
- text (string) – comment about new request status 
- new_status (string) – the new request status 
 
- Status Codes:
- 200 OK – request successfully updated 
- 400 Bad Request – missing or invalid field values 
- 403 Forbidden – user is not a moderator 
 
 
- GET /api/1/add-forge/request/list/#
- List add forge requests submitted by users. - Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
- Link – indicates that a subsequent result page is available and contains the url pointing to it 
 
- Query Parameters:
- page (int) – optional page number 
- per_page (int) – optional number of elements per page (bounded to 1000) 
 
- Status Codes:
- 200 OK – always 
 
 
- GET /api/1/add-forge/request/(id)/get/#
- Return all details about an add-forge request. - Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
 
- Parameters:
- id (int) – add-forge request identifier 
 
- Status Codes:
- 200 OK – request details successfully returned 
- 400 Bad Request – request identifier does not exist 
 
 
- POST /api/1/add-forge/request/create/#
- Create a new request to add a forge to the list of those crawled regularly by Software Heritage. - Warning - That endpoint is not publicly available and requires authentication in order to be able to request it. - Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
 
- Request JSON Object:
- forge_type (string) – the type of forge 
- forge_url (string) – the base URL of the forge 
- forge_contact_email (string) – email of an administrator of the forge to contact 
- forge_contact_name (string) – the name of the administrator 
- forge_contact_comment (string) – to explain how Software Heritage can verify forge administrator info are valid 
 
- Status Codes:
- 201 Created – request successfully created 
- 400 Bad Request – missing or invalid field values 
- 403 Forbidden – user not authenticated 
 
 
- POST /api/1/origin/save/bulk/#
- Request the saving of multiple software origins into the archive. - That endpoint enables to request the archival of multiple software origins through a POST request containing a list of origin URLs and their visit types in its body. - The following visit types are supported: - bzr,- cvs,- hg,- git,- svnand- tarball-directory.- The origins list data can be provided using the following content types: - text/csv(default)- When using CSV format, first column must contain origin URLs and second column the visit types. - "https://git.example.org/user/project","git" "https://download.example.org/project/source.tar.gz","tarball-directory" - To post the content of such file to the endpoint, you can use the following - curlcommand.- $ curl -X POST -H "Authorization: Bearer ****" \ -H "Content-Type: text/csv" \ --data-binary @/path/to/origins.csv \ https://archive.softwareheritage.org/api/1/origin/save/bulk/ 
- application/json- When using JSON format, the following schema must be used. - [ { "origin_url": "https://git.example.org/user/project", "visit_type": "git" }, { "origin_url": "https://download.example.org/project/source.tar.gz", "visit_type": "tarball-directory" } ] - To post the content of such file to the endpoint, you can use the following - curlcommand.- $ curl -X POST -H "Authorization: Bearer ****" \ -H "Content-Type: application/json" \ --data-binary @/path/to/origins.json \ https://archive.softwareheritage.org/api/1/origin/save/bulk/ 
- application/yaml- When using YAML format, the following schema must be used. - - origin_url: https://git.example.org/user/project visit_type: git - origin_url: https://download.example.org/project/source.tar.gz visit_type: tarball-directory - To post the content of such file to the endpoint, you can use the following - curlcommand.- $ curl -X POST -H "Authorization: Bearer ****" \ -H "Content-Type: application/yaml" \ --data-binary @/path/to/origins.yaml \ https://archive.softwareheritage.org/api/1/origin/save/bulk/ 
 - Once received, origins data are checked for correctness by validating URLs and verifying if visit types are supported. A request cannot be accepted if at least one origin is not valid. All origins with invalid format will be reported in the rejected request response. - Warning - That endpoint is not publicly available and requires authentication and special user permission in order to request it. - Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
- Content-Type – the content type of posted data, either - text/csv(default),- application/jsonor- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
 
- Response JSON Object:
- status (string) – either - acceptedor- rejected
- reason (string) – details about why a request got rejected 
- request_id (string) – request identifier (only when it its accepted) 
- rejected_origins (array) – list of rejected origins and details about the reasons (only when the request is rejected) 
 
- Status Codes:
- 200 OK – no error 
- 400 Bad Request – provided origins data are not valid 
- 401 Unauthorized – request is not authenticated 
- 403 Forbidden – user does not have permission to query the endpoint 
- 415 Unsupported Media Type – payload format is not supported 
 
 
- GET /api/1/origin/save/bulk/requests/#
- List previously submitted save bulk requests. - That endpoint enables to list the save bulk requests submitted by your user account and get their info URLs (see - GET /api/1/origin/save/bulk/request/(request_id)/). That list is returned in a paginated way if the number or requests is large.- Warning - That endpoint is not publicly available and requires authentication and special user permission in order to request it. - Query Parameters:
- page (number) – The submitted requests page number to retrieve 
- per_page (number) – Number of submitted requests per page, default to 1000, maximum is 10000 
 
- Response JSON Array of Objects:
- request_id (string) – UUID identifier of the request 
- request_date (date) – the date the request was submitted 
- request_info_url (string) – URL to get detailed info about the request 
 
- Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
- Link – indicates that a subsequent result page is available and contains the url pointing to it 
 
- Status Codes:
- 200 OK – no error 
- 401 Unauthorized – request is not authenticated 
- 403 Forbidden – user does not have permission to query the endpoint 
 
 
- GET /api/1/origin/save/bulk/request/(request_id)/#
- Get feedback about loading statuses of origins submitted through a save bulk request. - That endpoint enables to track the archival statuses of origins sumitted through a POST request using the - POST /api/1/origin/save/bulk/endpoint. Info about submitted origins are returned in a paginated way.- Note - Only origin visits whose dates are greater than the request date are reported by that endpoint. - Warning - That endpoint is not publicly available and requires authentication and special user permission in order to request it. Staff users are also allowed to query it. - Warning - Only the user that created a save bulk request or a staff user can get feedback about it. - Parameters:
- request_id (string) – UUID identifier of a save bulk request 
 
- Query Parameters:
- page (number) – The submitted origins info page number to retrieve 
- per_page (number) – Number of submitted origins info per page, default to 1000, maximum is 10000 
 
- Response JSON Array of Objects:
- origin_url (string) – URL of submitted origin 
- visit_type (string) – visit type for the origin 
- status (string) – submitted origin status, either - pending,- acceptedor- rejected
- last_scheduling_date (date) – ISO8601/RFC3339 representation of the last date (in UTC) when the origin was scheduled for loading into the archive, - nullif the origin got rejected
- last_visit_date (date) – ISO8601/RFC3339 representation of the last date (in UTC) when the origin was visited by Software Heritage, - nullif the origin got rejected or was not visited yet
- last_visit_status (string) – last visit status for the origin, either - successfulor- failed,- nullif the origin got rejected or was not visited yet
- last_snapshot_swhid (string) – last produced snapshot SWHID associated to the visit, - nullif the origin got rejected or was not visited yet
- rejection_reason (string) – if the origin got rejected gives more details about it 
- browse_url (string) – URL to browse the submitted origin if it got accepted and loaded into the archive, - nullif the origin got rejected or was not visited yet
 
- Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
- Link – indicates that a subsequent result page is available and contains the url pointing to it 
 
- Status Codes:
- 200 OK – no error 
- 401 Unauthorized – request is not authenticated 
- 403 Forbidden – user does not have permission to query the endpoint or get feedback about a request he did not submit 
 
 
- GET /api/1/origin/save/(visit_type)/url/(origin_url)/#
- POST /api/1/origin/save/(visit_type)/url/(origin_url)/#
- GET /api/1/origin/save/(request_id)/#
- Request the saving of a software origin into the archive or check the status of previously created save requests. - That endpoint enables to create a saving task for a software origin through a POST request. - Depending of the provided origin url, the save request can either be: - immediately accepted, for well known code hosting providers like for instance GitHub or GitLab 
- rejected, in case the url is blacklisted by Software Heritage 
- put in pending state until a manual check is done in order to determine if it can be loaded or not 
 - Once a saving request has been accepted, its associated saving task status can then be checked through a GET request on the same url. Returned status can either be: - not created: no saving task has been created 
- pending: saving task has been created and will be scheduled for execution 
- scheduled: the task execution has been scheduled 
- running: the task is currently executed 
- succeeded: the saving task has been successfully executed 
- failed: the saving task has been executed but it failed 
 - When issuing a POST request an object will be returned while a GET request will return an array of objects (as multiple save requests might have been submitted for the same origin). - It is also possible to get info about a specific save request by sending a GET request to the - /api/1/origin/save/(request_id)/endpoint.- Parameters:
- visit_type (string) – the type of visit to perform (currently the supported types are bzr, cvs, git, hg, and svn) 
- origin_url (string) – the url of the origin to save 
- request_id (number) – a save request identifier 
 
- Request Headers:
- Accept – the requested response content type, either - application/json(default) or- application/yaml
 
- Response Headers:
- Content-Type – this depends on Accept header of request 
 
- Response JSON Object:
- id (number) – the save request identifier 
- request_url (string) – Web API URL to follow up on that request 
- origin_url (string) – the url of the origin to save 
- visit_type (string) – the type of visit to perform 
- save_request_date (string) – the date (in iso format) the save request was issued 
- save_request_status (string) – the status of the save request, either accepted, rejected or pending 
- save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed 
- visit_date (string) – the date (in iso format) of the visit if a visit occurred, null otherwise. 
- visit_status (string) – the status of the visit, either full, partial, not_found or failed if a visit occurred, null otherwise. 
- note (string) – optional note giving details about the save request, for instance why it has been rejected 
- snapshot_swhid (string) – SWHID of snapshot associated to the visit (null if it is missing or unknown) 
- snapshot_url (string) – Web API URL to retrieve snapshot data 
- from_webhook (boolean) – indicates if the save request was created from a popular forge webhook receiver (see - POST /api/1/origin/save/webhook/github/for instance)
- webhook_origin (string) – indicates which forge type sent the webhook, currently the supported types are:bitbucket, gitea, github, gitlab, and sourceforge 
 
- Status Codes:
- 200 OK – no error 
- 400 Bad Request – an invalid visit type or origin url has been provided 
- 403 Forbidden – the provided origin url is blacklisted 
- 404 Not Found – no save requests have been found for a given origin 
 
 
- POST /api/1/origin/save/webhook/bitbucket/#
- Webhook receiver for Bitbucket to request or update the archival of a repository when new commits are pushed to it. - To add such webhook to one of your git repository hosted on Bitbucket, please follow Bitbucket’s webhooks guide. - The expected content type for the webhook payload must be - application/json.- Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed. - Response JSON Object:
- id (number) – the save request identifier 
- request_url (string) – Web API URL to follow up on that request 
- origin_url (string) – the url of the origin to save 
- visit_type (string) – the type of visit to perform 
- save_request_date (string) – the date (in iso format) the save request was issued 
- save_request_status (string) – the status of the save request, either accepted, rejected or pending 
- save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed 
- save_task_next_run (string) – the date and time from which the request is executed 
 
- Status Codes:
- 200 OK – save request for repository has been successfully created from the webhook payload. 
- 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload 
 
 
- POST /api/1/origin/save/webhook/gitea/#
- Webhook receiver for Gitea to request or update the archival of a repository when new commits are pushed to it. - To add such webhook to one of your git repository hosted on Gitea, please follow Gitea’s webhooks guide. - The expected content type for the webhook payload must be - application/json.- Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed. - Response JSON Object:
- id (number) – the save request identifier 
- request_url (string) – Web API URL to follow up on that request 
- origin_url (string) – the url of the origin to save 
- visit_type (string) – the type of visit to perform 
- save_request_date (string) – the date (in iso format) the save request was issued 
- save_request_status (string) – the status of the save request, either accepted, rejected or pending 
- save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed 
- save_task_next_run (string) – the date and time from which the request is executed 
 
- Status Codes:
- 200 OK – save request for repository has been successfully created from the webhook payload. 
- 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload 
 
 
- POST /api/1/origin/save/webhook/github/#
- Webhook receiver for GitHub to request or update the archival of a repository when new commits are pushed to it. - To add such webhook to one of your git repository hosted on GitHub, please follow GitHub’s webhooks guide. - The expected content type for the webhook payload must be - application/json.- Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed. - Response JSON Object:
- id (number) – the save request identifier 
- request_url (string) – Web API URL to follow up on that request 
- origin_url (string) – the url of the origin to save 
- visit_type (string) – the type of visit to perform 
- save_request_date (string) – the date (in iso format) the save request was issued 
- save_request_status (string) – the status of the save request, either accepted, rejected or pending 
- save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed 
- save_task_next_run (string) – the date and time from which the request is executed 
 
- Status Codes:
- 200 OK – save request for repository has been successfully created from the webhook payload. 
- 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload 
 
 
- POST /api/1/origin/save/webhook/gitlab/#
- Webhook receiver for GitLab to request or update the archival of a repository when new commits are pushed to it. - To add such webhook to one of your git repository hosted on GitLab, please follow GitLab’s webhooks guide. - The expected content type for the webhook payload must be - application/json.- Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed. - Response JSON Object:
- id (number) – the save request identifier 
- request_url (string) – Web API URL to follow up on that request 
- origin_url (string) – the url of the origin to save 
- visit_type (string) – the type of visit to perform 
- save_request_date (string) – the date (in iso format) the save request was issued 
- save_request_status (string) – the status of the save request, either accepted, rejected or pending 
- save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed 
- save_task_next_run (string) – the date and time from which the request is executed 
 
- Status Codes:
- 200 OK – save request for repository has been successfully created from the webhook payload. 
- 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload 
 
 
- POST /api/1/origin/save/webhook/sourceforge/#
- Webhook receiver for SourceForge to request or update the archival of a repository when new commits are pushed to it. - To add such webhook to one of your git, hg or svn repository hosted on SourceForge, please follow SourceForge’s webhooks guide. - The expected content type for the webhook payload must be - application/json.- Please not that to avoid abusing the archival service offered by Software Heritage at most one request per hour is created so the effective loading of the repository into the archive might be delayed. - Response JSON Object:
- id (number) – the save request identifier 
- request_url (string) – Web API URL to follow up on that request 
- origin_url (string) – the url of the origin to save 
- visit_type (string) – the type of visit to perform 
- save_request_date (string) – the date (in iso format) the save request was issued 
- save_request_status (string) – the status of the save request, either accepted, rejected or pending 
- save_task_status (string) – the status of the origin saving task, either not created, pending, scheduled, running, succeeded or failed 
- save_task_next_run (string) – the date and time from which the request is executed 
 
- Status Codes:
- 200 OK – save request for repository has been successfully created from the webhook payload. 
- 400 Bad Request – no save request has been created due to invalid POST request or missing data in webhook payload