Problem
Zenodo is a major repository for open access research outputs with 5.5M+ records, but is not currently included in our commons quantification project. Adding Zenodo would significantly expand our coverage of Creative Commons licensed content, particularly in academic and research domains.
Description
Implement data collection from Zenodo using their REST API to gather license information for quantifying the commons. This involves:
- Fetching records with structured license metadata
- Classifying Creative Commons and other open licenses
- Generating reports by year, resource type, and language
- Handling API rate limiting and pagination
Zenodo Useful Links
Official Documentation
API Endpoints
- Base URL:
https://zenodo.org/api/records
- Records Search:
https://zenodo.org/api/records
- Single Record:
https://zenodo.org/api/records/{id}
- Communities:
https://zenodo.org/api/communities
Technical Details
Query Strategy
GET https://zenodo.org/api/records?q=*&size=100&page=1&sort=bestmatch
Parameters:
q: Query string (use * for all records)
size: Records per page (300) implementation choice
page: Page number for pagination
sort: Sorting method (bestmatch recommended)
API Types Available
-
REST API (Recommended)
- Format: JSON
- Authentication: None required for public records
- Structured license data:
metadata.license.id
-
OAI-PMH (Not recommended)
- Format: XML Dublin Core
- Unreliable license parsing from free-text fields
(dc:rights)
Key Metadata Fields
- License:
metadata.license.id (structured, e.g., "cc-by-4.0")
- Access Rights:
metadata.access_right ("open", "restricted", "embargoed")
- Publication Date:
metadata.publication_date (ISO format)
- Resource Type:
metadata.resource_type.title
- Language:
metadata.language (ISO codes)
Implementation
Problem
Zenodo is a major repository for open access research outputs with 5.5M+ records, but is not currently included in our commons quantification project. Adding Zenodo would significantly expand our coverage of Creative Commons licensed content, particularly in academic and research domains.
Description
Implement data collection from Zenodo using their REST API to gather license information for quantifying the commons. This involves:
Zenodo Useful Links
Official Documentation
API Endpoints
https://zenodo.org/api/recordshttps://zenodo.org/api/recordshttps://zenodo.org/api/records/{id}https://zenodo.org/api/communitiesTechnical Details
Query Strategy
Parameters:
q: Query string (use*for all records)size: Records per page (300) implementation choicepage: Page number for paginationsort: Sorting method (bestmatch recommended)API Types Available
REST API (Recommended)
metadata.license.idOAI-PMH (Not recommended)
(dc:rights)Key Metadata Fields
metadata.license.id(structured, e.g., "cc-by-4.0")metadata.access_right("open", "restricted", "embargoed")metadata.publication_date(ISO format)metadata.resource_type.titlemetadata.language(ISO codes)Implementation