Move datapackage.json to repo root (Frictionless spec)#12
Conversation
There was a problem hiding this comment.
Pull request overview
This PR delivers the initial release (v0.1.0) of betydata, an R data package providing offline access to public data from the BETYdb (Biofuel Ecophysiological Traits and Yields) database. The package enables reproducible analyses of plant traits and crop yields without requiring database connectivity.
Changes:
- Complete R package structure with 16 datasets (traitsview + 15 support tables) totaling 43,532+ trait and yield records
- Multiple data formats: lazy-loaded .rda files, Parquet alternatives, and Frictionless metadata (datapackage.json)
- Comprehensive documentation: roxygen2 docs for all datasets, 4 vignettes (orientation, sql-analogs, pfts-priors, manuscript), and GitHub issue templates
- Quality controls: excludes checked=-1 records, public data only (access_level >= 4), full test coverage
- CI/CD infrastructure: GitHub Actions R-CMD-check workflow, testthat 3.0 test suite
Reviewed changes
Copilot reviewed 38 out of 71 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| DESCRIPTION | Package metadata and dependencies; minor email format issue |
| CITATION.cff | Citation metadata; email and missing preferred-citation issues |
| LICENSE | BSD-3-Clause license file |
| README.md | Comprehensive package documentation; table formatting issue |
| NEWS.md | Release notes documenting v0.1.0 |
| R/betydata-package.R | Package-level documentation |
| R/data.R | Roxygen2 documentation for all 16 datasets |
| man/*.Rd | Generated documentation files for datasets |
| vignettes/*.Rmd | Four tutorial vignettes; minor issues in manuscript.Rmd and pfts-priors.Rmd |
| tests/testthat/*.R | Test suite for data and metadata validation; deprecated context() calls |
| data-raw/make-data.R | Data build script for generating .rda and Parquet files |
| inst/metadata/datapackage.json | Frictionless Data package metadata |
| inst/extdata/parquet/*.parquet | Sample Parquet data files |
| data/*.rda | Binary R data files (compressed with xz) |
| .github/workflows/*.yaml | GitHub Actions CI configuration |
| .github/ISSUE_TEMPLATE/*.md | Issue templates for data corrections and verifications |
| .gitignore, .Rbuildignore | Build and version control configuration; CSV exclusion concern |
Comments suppressed due to low confidence (2)
tests/testthat/test-metadata.R:3
- The
context()function on line 3 is deprecated in testthat 3.0.0 and later. According to the DESCRIPTION file, this package usestestthat (>= 3.0.0)and hasConfig/testthat/edition: 3. The context() calls should be removed as they are no longer needed and will generate warnings.
tests/testthat/test-data.R:3 - The
context()function on line 3 is deprecated in testthat 3.0.0 and later. According to the DESCRIPTION file, this package usestestthat (>= 3.0.0)and hasConfig/testthat/edition: 3. The context() calls should be removed as they are no longer needed and will generate warnings.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
dlebauer
left a comment
There was a problem hiding this comment.
I've done a quick first review. On a future review I will go through all of the vignettes and explore the tables as they exist.
I am now wondering if we should 1) store the data in CSV files to allow text-based version control and 2) if we can reconstruct traitsview on the fly from the component datasets (i.e. traitsview should not be in data_raw)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 50 out of 79 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/PecanProject/betydata/sessions/57bbceb9-4e27-48ae-926f-a76673577fbb Co-authored-by: dlebauer <464871+dlebauer@users.noreply.github.com>
|
@divine7022 please take a look at the most recent change moving datapackage.json to root, and then ready to merge. c1e3c90 |
|
@dlebauer we are good to go now, once checks pass |
|
@copilot please restore original issue description |
Data Correction
Summary
datapackage.jsonwas ininst/metadata/(shipped with the R package) but referenced paths underdata-raw/csv/(excluded from the built package via.Rbuildignore). Moves the descriptor to the repo root per the Frictionless Data spec, and excludes it from the installed package.Affected Record(s)
Current Value
inst/metadata/datapackage.json— shipped inside the R package, pointing todata-raw/csv/<name>.csvpaths that don't exist in the installed package.Corrected Value
datapackage.jsonat repo root — correct per Frictionless spec; excluded from built package via.Rbuildignoresince it's a repo/data-release artifact, not a package artifact.Source / Evidence
data-raw/is already in.Rbuildignore; the descriptor referencing it has no place ininst/Changes Made
data-raw/csv/source("data-raw/make-data.R")to rebuild.rdafilesNEWS.mdStructural changes:
inst/metadata/datapackage.json→datapackage.json(repo root)^datapackage\.json$to.Rbuildignoredir.create("inst/metadata/")frommake-data.R; updated output path to"datapackage.json"inst/metadata/directoryChecklist
devtools::check()passes with no errors