diff --git a/docs/source/system_reference.rst b/docs/source/system_reference.rst index e0eb9317..4617a278 100644 --- a/docs/source/system_reference.rst +++ b/docs/source/system_reference.rst @@ -23,4 +23,5 @@ It is often referenced from the application-focused Science and Technical tutori system_reference_guide/accessing_bucket_data.ipynb system_reference_guide/ade_custom_extensions.rst system_reference_guide/faq.rst + system_reference_guide/accessing_bucket_data_in_r.ipynb diff --git a/docs/source/system_reference_guide/accessing_bucket_data_in_r.ipynb b/docs/source/system_reference_guide/accessing_bucket_data_in_r.ipynb new file mode 100644 index 00000000..b67eee1d --- /dev/null +++ b/docs/source/system_reference_guide/accessing_bucket_data_in_r.ipynb @@ -0,0 +1,641 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "3e2514cc", + "metadata": {}, + "source": [ + "# Access workspace bucket data from R\n", + "\n", + "This notebook is the R version of the Python workflow that uses `maap.aws.workspace_bucket_credentials()` to retrieve temporary AWS credentials for MAAP workspace and organization S3 buckets.\n", + "\n", + "The access level is controlled by MAAP and is returned in `authorized_s3_paths`. This notebook retrieves those temporary credentials and shows two R access patterns:\n", + "\n", + "1. Update a named AWS profile in `~/.aws/credentials`.\n", + "2. Inject the same temporary credentials directly into the current GDAL/VSI environment for `/vsis3/` reads.\n", + "\n", + "The profile-based method is useful because many GDAL-backed R packages can read from the standard AWS credentials file. The GDAL/VSI environment method is useful for session-only access because it avoids depending on a stored profile.\n" + ] + }, + { + "cell_type": "markdown", + "id": "6f9209d9", + "metadata": {}, + "source": [ + "## 1. Retrieve temporary credentials\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "ef0fa8c3", + "metadata": {}, + "outputs": [], + "source": [ + "library(httr2)\n", + "\n", + "profile_name <- \"geotrees\"\n", + "aws_region <- \"us-west-2\"\n", + "\n", + "# Use the MAAP API host, not the MAAP Hub website.\n", + "maap_api_host <- \"https://api.maap-project.org\"" + ] + }, + { + "cell_type": "markdown", + "id": "71b61322", + "metadata": {}, + "source": [ + "The MAAP Hub website, such as `https://hub.maap-project.org`, is the JupyterHub/ADE user interface. The credential request should go to the MAAP API service.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "c3a60a8e", + "metadata": {}, + "outputs": [], + "source": [ + "trim_slash <- function(x) {\n", + " gsub(\"^/+|/+$\", \"\", x)\n", + "}\n", + "\n", + "join_url <- function(base, path) {\n", + " paste0(gsub(\"/+$\", \"\", base), \"/\", trim_slash(path))\n", + "}\n", + "\n", + "get_maap_config <- function(maap_api_host = \"https://api.maap-project.org\") {\n", + " config_url <- join_url(maap_api_host, \"api/environment/config\")\n", + "\n", + " resp <- request(config_url) |>\n", + " req_headers(Accept = \"application/json\") |>\n", + " req_perform()\n", + "\n", + " resp_body_json(resp, simplifyVector = FALSE)\n", + "}\n", + "\n", + "build_maap_endpoint <- function(config, maap_api_host, endpoint_key) {\n", + " api_root <- config$service$maap_api_root\n", + "\n", + " # If the config gives only a path like \"/api\", attach it to the API host.\n", + " # If it gives a full URL, use it as-is.\n", + " if (!grepl(\"^https?://\", api_root)) {\n", + " api_root <- join_url(maap_api_host, api_root)\n", + " }\n", + "\n", + " endpoint_path <- config$maap_endpoint[[endpoint_key]]\n", + "\n", + " if (is.null(endpoint_path)) {\n", + " stop(paste(\"Endpoint key not found in MAAP config:\", endpoint_key))\n", + " }\n", + "\n", + " join_url(api_root, endpoint_path)\n", + "}\n", + "\n", + "get_workspace_bucket_credentials <- function(\n", + " maap_api_host = \"https://api.maap-project.org\"\n", + ") {\n", + " config <- get_maap_config(maap_api_host)\n", + "\n", + " token <- config$service$maap_token\n", + "\n", + " if (is.null(token) || token == \"\") {\n", + " stop(\"Could not find maap_token from MAAP environment config.\")\n", + " }\n", + "\n", + " endpoint <- build_maap_endpoint(\n", + " config = config,\n", + " maap_api_host = maap_api_host,\n", + " endpoint_key = \"workspace_bucket_credentials\"\n", + " )\n", + "\n", + " headers <- list(\n", + " Accept = \"application/json\",\n", + " token = token\n", + " )\n", + "\n", + " # Include proxy ticket if the environment provides one.\n", + " maap_pgt <- Sys.getenv(\"MAAP_PGT\")\n", + " if (maap_pgt != \"\") {\n", + " headers[[\"proxy-ticket\"]] <- maap_pgt\n", + " }\n", + "\n", + " resp <- request(endpoint) |>\n", + " req_headers(!!!headers) |>\n", + " req_perform()\n", + "\n", + " resp_body_json(resp, simplifyVector = FALSE)\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8058b13a", + "metadata": {}, + "outputs": [], + "source": [ + "resp <- get_workspace_bucket_credentials(maap_api_host)\n", + "\n", + "str(resp, max.level = 2)" + ] + }, + { + "cell_type": "markdown", + "id": "e4aaff5c", + "metadata": {}, + "source": [ + "The response contains:\n", + "\n", + "- `credentials` \u2014 temporary AWS credentials:\n", + " - `aws_access_key_id`\n", + " - `aws_secret_access_key`\n", + " - `aws_session_token`\n", + " - `expires_at`\n", + "- `authorized_s3_paths` \u2014 S3 paths that the credentials can access:\n", + " - `bucket`\n", + " - `prefix`\n", + " - `uri`\n", + " - `type`\n", + " - `access`\n" + ] + }, + { + "cell_type": "markdown", + "id": "858ac406", + "metadata": {}, + "source": [ + "## 2. Create or update an AWS profile from the credentials\n", + "\n", + "The Python workflow creates a `boto3.Session` from the returned credentials.\n", + "\n", + "In R, one practical equivalent for GDAL-backed packages is to write the temporary credentials to the standard AWS credentials file:\n", + "\n", + "```text\n", + "~/.aws/credentials\n", + "```\n", + "\n", + "The credentials are written under the profile name `geotrees`.\n", + "\n", + "If the file already exists, this notebook **does not overwrite the whole file**. It updates only the `[geotrees]` profile block. Other AWS profiles already present in `~/.aws/credentials` are preserved.\n", + "\n", + "If `[geotrees]` already exists, its access key, secret key, and session token are replaced with the latest temporary credentials. This is important because MAAP returns temporary credentials that expire and need to be refreshed.\n", + "\n", + "The directory and file permissions are also set:\n", + "\n", + "- `~/.aws/` directory: `0700`\n", + "- `~/.aws/credentials` file: `0600`\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "67661bbb", + "metadata": {}, + "outputs": [], + "source": [ + "get_value <- function(x, names) {\n", + " for (name in names) {\n", + " if (!is.null(x[[name]])) {\n", + " return(x[[name]])\n", + " }\n", + " }\n", + "\n", + " NULL\n", + "}\n", + "\n", + "replace_or_append_profile <- function(existing_lines, profile_name, profile_lines) {\n", + " profile_header <- paste0(\"[\", profile_name, \"]\")\n", + " header_pattern <- \"^\\\\s*\\\\[[^]]+\\\\]\\\\s*$\"\n", + "\n", + " if (length(existing_lines) == 0) {\n", + " return(profile_lines)\n", + " }\n", + "\n", + " header_locations <- grep(header_pattern, existing_lines)\n", + " target_start <- which(trimws(existing_lines) == profile_header)\n", + "\n", + " if (length(target_start) == 0) {\n", + " # Profile does not exist yet. Preserve the existing file and append the new profile.\n", + " return(c(existing_lines, \"\", profile_lines))\n", + " }\n", + "\n", + " target_start <- target_start[1]\n", + " later_headers <- header_locations[header_locations > target_start]\n", + " target_end <- if (length(later_headers) > 0) later_headers[1] - 1 else length(existing_lines)\n", + "\n", + " before <- if (target_start > 1) existing_lines[1:(target_start - 1)] else character()\n", + " after <- if (target_end < length(existing_lines)) existing_lines[(target_end + 1):length(existing_lines)] else character()\n", + "\n", + " c(before, profile_lines, after)\n", + "}\n", + "\n", + "extract_aws_credentials <- function(resp) {\n", + " # MAAP may return credentials either at the top level or nested under \"credentials\".\n", + " creds <- resp$credentials\n", + " if (is.null(creds)) {\n", + " creds <- resp\n", + " }\n", + "\n", + " access_key <- get_value(\n", + " creds,\n", + " c(\"aws_access_key_id\", \"accessKeyId\", \"AccessKeyId\")\n", + " )\n", + "\n", + " secret_key <- get_value(\n", + " creds,\n", + " c(\"aws_secret_access_key\", \"secretAccessKey\", \"SecretAccessKey\")\n", + " )\n", + "\n", + " session_token <- get_value(\n", + " creds,\n", + " c(\"aws_session_token\", \"sessionToken\", \"SessionToken\")\n", + " )\n", + "\n", + " if (is.null(access_key) || is.null(secret_key) || is.null(session_token)) {\n", + " print(resp)\n", + " stop(\"Could not find AWS credential fields in MAAP response.\")\n", + " }\n", + "\n", + " list(\n", + " access_key = access_key,\n", + " secret_key = secret_key,\n", + " session_token = session_token,\n", + " expires_at = get_value(creds, c(\"expires_at\", \"Expiration\", \"expiration\"))\n", + " )\n", + "}\n", + "\n", + "write_aws_credentials_profile <- function(resp, profile_name = \"geotrees\") {\n", + " creds <- extract_aws_credentials(resp)\n", + "\n", + " aws_dir <- path.expand(\"~/.aws\")\n", + " credentials_file <- file.path(aws_dir, \"credentials\")\n", + "\n", + " if (!dir.exists(aws_dir)) {\n", + " dir.create(aws_dir, recursive = TRUE, mode = \"0700\")\n", + " }\n", + "\n", + " profile_lines <- c(\n", + " paste0(\"[\", profile_name, \"]\"),\n", + " paste0(\"aws_access_key_id = \", creds$access_key),\n", + " paste0(\"aws_secret_access_key = \", creds$secret_key),\n", + " paste0(\"aws_session_token = \", creds$session_token)\n", + " )\n", + "\n", + " existing_lines <- if (file.exists(credentials_file)) {\n", + " readLines(credentials_file, warn = FALSE)\n", + " } else {\n", + " character()\n", + " }\n", + "\n", + " updated_lines <- replace_or_append_profile(\n", + " existing_lines = existing_lines,\n", + " profile_name = profile_name,\n", + " profile_lines = profile_lines\n", + " )\n", + "\n", + " writeLines(updated_lines, credentials_file)\n", + " Sys.chmod(aws_dir, mode = \"0700\")\n", + " Sys.chmod(credentials_file, mode = \"0600\")\n", + "\n", + " invisible(credentials_file)\n", + "}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "43b66a3d", + "metadata": {}, + "outputs": [], + "source": [ + "credentials_file <- write_aws_credentials_profile(\n", + " resp = resp,\n", + " profile_name = profile_name\n", + ")\n", + "\n", + "Sys.setenv(\n", + " AWS_PROFILE = profile_name,\n", + " AWS_DEFAULT_REGION = aws_region,\n", + " AWS_SDK_LOAD_CONFIG = \"1\",\n", + " AWS_NO_SIGN_REQUEST = \"NO\"\n", + ")\n", + "\n", + "cat(\"AWS credentials profile created or updated successfully.\\n\")\n", + "cat(\"Profile:\", profile_name, \"\\n\")\n", + "cat(\"Credentials file:\", credentials_file, \"\\n\")\n", + "\n", + "creds <- extract_aws_credentials(resp)\n", + "if (!is.null(creds$expires_at)) {\n", + " cat(\"Expires at:\", creds$expires_at, \"\\n\")\n", + "}\n" + ] + }, + { + "cell_type": "markdown", + "id": "eaea9fd8", + "metadata": {}, + "source": [ + "Confirm that the profile is available in the R session.\n", + "\n", + "Do not print the full credentials file because it contains temporary secrets. The check below only verifies that the selected profile header exists.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6fa1e455", + "metadata": {}, + "outputs": [], + "source": [ + "Sys.getenv(\"AWS_PROFILE\")\n", + "Sys.getenv(\"AWS_DEFAULT_REGION\")\n", + "file.exists(path.expand(\"~/.aws/credentials\"))\n", + "any(readLines(path.expand(\"~/.aws/credentials\"), warn = FALSE) == paste0(\"[\", profile_name, \"]\"))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Optional: inject temporary credentials into the current GDAL/VSI session\n", + "\n", + "Another possible approach is to pass the temporary credentials directly into the current R session before opening `/vsis3/` paths.\n", + "\n", + "This may work well for GDAL-backed packages because they can read S3 authentication information from environment variables or GDAL configuration values during the current session.\n", + "\n", + "This method is useful when you do not want to depend on a profile file. It should still be tested in the target MAAP Hub image with the packages you plan to use.\n" + ] + }, + { + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [], + "source": [ + "set_gdal_aws_env <- function(resp, aws_region = \"us-west-2\") {\n", + " creds <- extract_aws_credentials(resp)\n", + "\n", + " Sys.setenv(\n", + " AWS_ACCESS_KEY_ID = creds$access_key,\n", + " AWS_SECRET_ACCESS_KEY = creds$secret_key,\n", + " AWS_SESSION_TOKEN = creds$session_token,\n", + " AWS_DEFAULT_REGION = aws_region,\n", + " AWS_REGION = aws_region,\n", + " AWS_NO_SIGN_REQUEST = \"NO\"\n", + " )\n", + "\n", + " invisible(TRUE)\n", + "}\n", + "\n", + "set_gdal_aws_env(resp, aws_region = aws_region)\n", + "\n", + "cat(\"Temporary AWS credentials were added to the current R session environment.\\n\")\n", + "cat(\"This can be tested with GDAL/VSI reads from terra, lasR, stars, and sf.\\n\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "2f2d67d8", + "metadata": {}, + "source": [ + "## 4. Working with your workspace bucket\n", + "\n", + "The Python example uses the first entry in `authorized_s3_paths` as the workspace bucket. We do the same here.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f469c5ce", + "metadata": {}, + "outputs": [], + "source": [ + "workspace <- resp$authorized_s3_paths[[1]]\n", + "\n", + "workspace_bucket <- workspace$bucket\n", + "workspace_prefix <- workspace$prefix\n", + "workspace_uri <- workspace$uri\n", + "workspace_access <- workspace$access\n", + "\n", + "cat(\"Workspace bucket:\", workspace_bucket, \"\\n\")\n", + "cat(\"Workspace prefix:\", workspace_prefix, \"\\n\")\n", + "cat(\"Workspace URI:\", workspace_uri, \"\\n\")\n", + "cat(\"Access:\", workspace_access, \"\\n\")" + ] + }, + { + "cell_type": "markdown", + "id": "7fd4fa89", + "metadata": {}, + "source": [ + "For GDAL-backed R packages such as `terra` and `lasR`, convert the S3 URI into GDAL's `/vsis3/` format. This same `/vsis3/` path may also work with `stars` and `sf`, but those package-specific reads should be confirmed in the MAAP Hub image being documented.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ab2d2a68", + "metadata": {}, + "outputs": [], + "source": [ + "workspace_vsis3 <- sub(\"^s3://\", \"/vsis3/\", workspace_uri)\n", + "\n", + "workspace_vsis3" + ] + }, + { + "cell_type": "markdown", + "id": "4de42f12", + "metadata": {}, + "source": [ + "### Read a raster from the workspace bucket\n", + "\n", + "Use a real object that exists in your authorized workspace path.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "88518541", + "metadata": {}, + "outputs": [], + "source": [ + "library(terra)\n", + "\n", + "# Replace example.tif with a real file inside your authorized workspace path.\n", + "s3_path <- file.path(workspace_uri, \"example.tif\")\n", + "vsis3_path <- sub(\"^s3://\", \"/vsis3/\", s3_path)\n", + "\n", + "# Uncomment after replacing example.tif with a real object.\n", + "# r <- rast(vsis3_path)\n", + "# r" + ] + }, + { + "cell_type": "markdown", + "id": "8fa67099", + "metadata": {}, + "source": [ + "### Upload or write a file to the workspace bucket\n", + "\n", + "Writing only works when the path has `access = \"read_write\"`.\n", + "\n", + "The example below creates a small raster, writes it to the workspace bucket, and reads it back. This is the R/GDAL equivalent of using `s3.upload_file()` in the Python example.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "97385d65", + "metadata": {}, + "outputs": [], + "source": [ + "library(terra)\n", + "\n", + "if (workspace_access == \"read_write\") {\n", + " r <- rast(\n", + " nrows = 10,\n", + " ncols = 10,\n", + " xmin = 0,\n", + " xmax = 10,\n", + " ymin = 0,\n", + " ymax = 10\n", + " )\n", + "\n", + " values(r) <- 1:ncell(r)\n", + "\n", + " output_path <- file.path(workspace_vsis3, \"r_credential_test.tif\")\n", + "\n", + " writeRaster(r, output_path, overwrite = TRUE)\n", + "\n", + " r_check <- rast(output_path)\n", + " r_check\n", + "} else {\n", + " cat(\"Workspace path is not writable. Access level:\", workspace_access, \"\\n\")\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "f3d2f8f1", + "metadata": {}, + "source": [ + "## 5. Working with organization shared buckets\n", + "\n", + "Additional organization-granted buckets appear as extra entries in `authorized_s3_paths`. Each entry tells you whether it is `read_write` or `read_only`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "59e540a2", + "metadata": {}, + "outputs": [], + "source": [ + "for (path in resp$authorized_s3_paths) {\n", + " cat(path$uri, \"(\", path$access, \")\", \"\\n\")\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "44c2caf8", + "metadata": {}, + "outputs": [], + "source": [ + "org_paths <- Filter(\n", + " function(path) {\n", + " !is.null(path$type) && path$type == \"org\"\n", + " },\n", + " resp$authorized_s3_paths\n", + ")\n", + "\n", + "length(org_paths)" + ] + }, + { + "cell_type": "markdown", + "id": "ea879cf0", + "metadata": {}, + "source": [ + "Select the first organization path, if one is available.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "612db61d", + "metadata": {}, + "outputs": [], + "source": [ + "if (length(org_paths) > 0) {\n", + " shared <- org_paths[[1]]\n", + "\n", + " shared_bucket <- shared$bucket\n", + " shared_prefix <- shared$prefix\n", + " shared_uri <- shared$uri\n", + " shared_access <- shared$access\n", + " shared_vsis3 <- sub(\"^s3://\", \"/vsis3/\", shared_uri)\n", + "\n", + " cat(\"Shared bucket:\", shared_bucket, \"\\n\")\n", + " cat(\"Shared prefix:\", shared_prefix, \"\\n\")\n", + " cat(\"Shared URI:\", shared_uri, \"\\n\")\n", + " cat(\"Shared /vsis3/ path:\", shared_vsis3, \"\\n\")\n", + " cat(\"Access:\", shared_access, \"\\n\")\n", + "} else {\n", + " cat(\"No organization shared buckets were returned.\\n\")\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "28cc1785", + "metadata": {}, + "source": [ + "To read from a shared bucket, use a real object path inside the shared prefix. To write to a shared bucket, first check that `access` is `read_write`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "7b4c82d1", + "metadata": {}, + "outputs": [], + "source": [ + "if (length(org_paths) > 0) {\n", + " shared_file <- file.path(shared_vsis3, \"shared_dataset.tif\")\n", + "\n", + " # Uncomment after replacing shared_dataset.tif with a real object.\n", + " # shared_raster <- rast(shared_file)\n", + " # shared_raster\n", + "\n", + " if (shared_access == \"read_write\") {\n", + " # Example write path. Uncomment when ready to write.\n", + " # shared_output <- file.path(shared_vsis3, \"my_output.tif\")\n", + " # writeRaster(r, shared_output, overwrite = TRUE)\n", + " }\n", + "}" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "R", + "language": "R", + "name": "ir" + }, + "language_info": { + "codemirror_mode": "r", + "file_extension": ".r", + "mimetype": "text/x-r-source", + "name": "R", + "pygments_lexer": "r", + "version": "4.5.1" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}