Handle concurrently deleted local storage dataset#10480
Conversation
Surface if a dataset is gone, and instead of returning a 500, match on the new DatasetNotFound variant and return Ok() from `delete_local_storage_dataset`. Fixes oxidecomputer#10477
| }, | ||
| } | ||
|
|
||
| impl From<SetValueErrorRaw> for EnsureDatasetErrorRaw { |
There was a problem hiding this comment.
Part of me kind of wonders if we might want to implement DatasetNotFound as a generic wrapper around any other error that's either the inner error or DatasetNotFound, and if that might reduce some boilerplate here? Something like:
#[derive(thiserror::Error)]
enum DatasetError<T> {
#[error("Dataset not found")]
NotFound,
#[error(transparent)]
Other(#[from] T)
}although, it's equally possible that this is actually more annoying than just doing it the current way
| #[error("Dataset not found")] | ||
| DatasetNotFound, | ||
|
|
||
| #[error("ZFS execution error")] | ||
| Execution(#[from] crate::ExecutionError), | ||
|
|
||
| #[error("Unexpected output from ZFS commands: {0}")] | ||
| Output(String), | ||
|
|
||
| #[error("Dataset does not exist")] | ||
| DoesNotExist, |
There was a problem hiding this comment.
what's the difference between DatasetNotFound and the DoesNotExist error which was already here? should that one be removed and should we be erturning DatasetNotFound instead?
if they have different semantics, it would be nice to have a comment explaining that (and maybe naming that makes the distinction clearer?), because this sure looks confusing.
| crate::ExecutionError::CommandFailure(info) | ||
| if info.stderr.contains("does not exist") => | ||
| { |
There was a problem hiding this comment.
i kinda wonder if we ought to have an execute_async wrapper for zfs commands, or something, that does this detection and returns either an ExecutionError or a DatasetNotFound, rather than forcing every caller to detect it individually? the caller-side detection seems sketchier to me, because now you have to remember to check for this contents in info.stderr every time you call a zfs command. meanwhile, having the function that runs the command do it gives the caller an enum which they are forced to handle.
take it or leave it, just a thought.
| values: name_values | ||
| .iter() | ||
| .map(|(k, v)| format!("{k}={v}")) | ||
| .join(","), |
There was a problem hiding this comment.
we are now doing this twice, btw --- you have a
let values =
name_values.iter().map(|(k, v)| format!("{k}={v}")).join(",");at the top of the map_err closure...
| values: name_values | |
| .iter() | |
| .map(|(k, v)| format!("{k}={v}")) | |
| .join(","), | |
| values, |
| RemoveReservationErrorInner::DatasetNotFound => { | ||
| // If the parent dataset is no longer found, then a | ||
| // concurrent deletion occurred. Return Ok | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
hm, is this really the only place we might encounter a dataset that was already removed by a concurrent deletion? should we also handle this error at the point where we actually delete the child volume dataset, too?
Surface if a dataset is gone, and instead of returning a 500, match on the new DatasetNotFound variant and return Ok() from
delete_local_storage_dataset.Fixes #10477