Conversation
| arrow::write_parquet(reporting_rate_dataelement, file_path) | ||
| log_msg(glue::glue("Exported : {file_path}")) | ||
|
|
||
| file_path <- file.path(output_data_path, paste0(COUNTRY_CODE, "_reporting_rate_dataelement.csv")) |
There was a problem hiding this comment.
Let's try to make generic code, if we want to have files saved in a function , then let's do:
-Give the full path where the file should be saved instead of a hardcoded file.path(DATA_PATH, "reporting_rate")
-Pass the name of the file to be saved like function(data, output_dir, base_name), there we pass something like: paste0(COUNTRY_CODE, "_reporting_rate_dataelement")
However, this operation is quite straight forward, so I would even just leave it in the notebook..
write.csv(reporting_rate_dataelement, file.path(setup$DATA_PATH, "reporting_something", paste0(COUNTRY_CODE, "_reporting_rate_dataelement.csv")), row.names = FALSE)
write_parquet(reporting_rate_dataelement, file.path(setup$DATA_PATH, "reporting_something", paste0(COUNTRY_CODE, "_reporting_rate_dataelement.parquet")))
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Load SNT metadata\n", |
There was a problem hiding this comment.
perhaps we can have something similar to what we do when we load snt config?
something like a load_snt_metadata() ?
is it a good idea maybe to generalize to a function that just loads json files?
config_json <- load_snt_config(file.path(CONFIG_PATH, "SNT_config.json"))
| "CODE_PATH <- file.path(SNT_ROOT_PATH, 'code') # this is where we store snt_utils.r\n", | ||
| "CONFIG_PATH <- file.path(SNT_ROOT_PATH, 'configuration') # .json config file\n", | ||
| "DATA_PATH <- file.path(SNT_ROOT_PATH, 'data', 'dhis2') \n", | ||
| "SNT_ROOT_PATH <- \"~/workspace\"\n", |
There was a problem hiding this comment.
I think this paths are set in the get_setup_variables() function.
you can try replicate what you did in snt_dhis2_reporting_rate_dataelement.ipynb
But not urgent .. for the future.
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Important: this will break if reporting rate was calculated as DataSet method because it will not find the file\n", |
There was a problem hiding this comment.
can we re-use the code ?
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "shapes <- tryCatch({ get_latest_dataset_file_in_memory(DHIS2_FORMATTED_DATASET_NAME, paste0(COUNTRY_CODE, \"_shapes.geojson\")) }, \n", |
There was a problem hiding this comment.
we can re-use the functions here
| "Sys.setenv(PROJ_LIB = \"/opt/conda/share/proj\")\n", | ||
| "Sys.setenv(GDAL_DATA = \"/opt/conda/share/gdal\")\n", | ||
| "Sys.setenv(RETICULATE_PYTHON = \"/opt/conda/bin/python\")\n", | ||
| "CODE_PATH <- file.path(SNT_ROOT_PATH, \"code\")\n", |
There was a problem hiding this comment.
(related to previous comment) these paths are available in the snt_environment variable, better to use that.
|
|
||
|
|
||
| #' Write CSV + Parquet under `<DATA_PATH>/dhis2/reporting_rate/`. | ||
| write_reporting_rate_dataelement_outputs <- function(reporting_rate_tbl, snt_environment, country_code) { |
There was a problem hiding this comment.
related to previous comments.. It think this type of functions just to save some files are a bit of an overkill in complexity.. if needed, let's try to find a generic one size fit all solution. If not possible let's just keep this things in the notebook (at least for now)
| }, | ||
| "source": [ | ||
| "cx <- parse_reporting_rate_dataset_snt_settings(config_json)\n", | ||
| "list2env(cx, envir = .GlobalEnv)\n" |
There was a problem hiding this comment.
where is this used?
I think this is the same in the dataelements RR pipeline, parse_reporting_rate_dataset_snt_settings() seems unnecessary as we have config_json from where to collect variables, let's not hide that
| }, | ||
| "source": [ | ||
| "dhis2_reporting <- load_dataset_file(\n", | ||
| " config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_FORMATTED,\n", |
There was a problem hiding this comment.
better to save this parameters in variables:
formatting_dataset_id <- config_json$SNT_DATASET_IDENTIFIERS$DHIS2_DATASET_FORMATTED
| } | ||
| }, | ||
| "source": [ | ||
| "# NER-specific normalization quality check\n", |
There was a problem hiding this comment.
nothing is done in this step? perhaps the code is moved to the country specific? if so , we should get rid of this step in the generic notebook
|
|
||
|
|
||
| #' Write CSV + Parquet under `<DATA_PATH>/dhis2/reporting_rate/`. | ||
| write_reporting_rate_dataset_outputs <- function(reporting_rate_tbl, snt_environment, country_code) { |
There was a problem hiding this comment.
if not used in the notebook you can remove it.
|
Just a note to reflect here, so I don't forget : My main concern with how functions are being used (not only in this PR, but across SNT) is that it sometimes feels like the right pieces of logic aren’t always being properly grouped or encapsulated. It’s not about just copying notebook code into functions, since that can actually make things harder to read. But at the same time, functions shouldn’t be created just for the sake of having functions. |
The reporting rate dataset/dataelement rework