feat: per-rule queues, per-job tags, EC2/Fargate-aware validation#40
feat: per-rule queues, per-job tags, EC2/Fargate-aware validation#40nh13 wants to merge 8 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds two passthrough AWS Batch client methods; extends BatchJobBuilder to derive job queue, detect EC2 vs FARGATE via AWS queries, perform platform-specific resource validation, set platformCapabilities on job definitions, and merge job tags from settings plus an env var; tests added for tag and platform behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Env as Env/Settings
participant Builder as BatchJobBuilder
participant Client as BatchClient
participant AWS as AWS Batch
Env->>Builder: provide settings, job, env-var tags
Builder->>Client: describe_job_queues(...)
Client->>AWS: DescribeJobQueues API
AWS-->>Client: queue description
Client-->>Builder: queue info
alt computeEnvironment referenced
Builder->>Client: describe_compute_environments(...)
Client->>AWS: DescribeComputeEnvironments API
AWS-->>Client: compute env description
Client-->>Builder: compute env info
end
Builder->>Builder: determine platform (EC2 / FARGATE) and validate resources
Builder->>AWS: register_job_definition(platformCapabilities=[Builder.platform], ...)
AWS-->>Builder: job definition ARN
Builder->>Builder: _build_job_tags() merge settings + env tags
Builder->>Client: submit_job(..., jobQueue=Builder.job_queue, tags=merged?)
Client->>AWS: SubmitJob API (with tags if present)
AWS-->>Client: submit response
Client-->>Builder: submit response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (1)
179-205:⚠️ Potential issue | 🔴 CriticalFargate job definitions cannot use EC2-only container settings.
Line 215 sets
platformCapabilities=[self.platform], which can be FARGATE when detected from the job queue. However, the container_properties dict (lines 179–205) violates AWS Fargate requirements:
executionRoleArnis required but never suppliedprivileged=Trueis not supported; Fargate requires it omitted orfalse- GPU resourceRequirements (lines 198–204) are unsupported on Fargate
The AWS Batch API will reject Fargate job registrations with these violations. Platform-specific container-property assembly is needed before Fargate can work.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 179 - 205, The container_properties assembly (container_properties dict in batch_job_builder.py used when creating job definitions) is not Fargate-safe: when platformCapabilities is set to self.platform (FARGATE) you must include executionRoleArn, remove or set privileged to False, and omit GPU resourceRequirements; update the logic that builds container_properties and the call site that sets platformCapabilities to make platform-specific adjustments (detect self.platform == "FARGATE" and then add executionRoleArn from settings, delete or set privileged to False, and skip adding BATCH_JOB_RESOURCE_REQUIREMENT_TYPE.GPU entries) so Fargate job registrations meet AWS requirements.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 123-133: The current logic picks min(valid_mems) which can
under-provision memory; instead, in the block that computes valid_mems from
VALID_RESOURCES_MAPPING for the given vcpu, filter for candidates >= mem and
pick the smallest of those (e.g., candidate_mem = min(m for m in valid_mems if m
>= mem)); if no candidate exists raise a WorkflowError explaining the requested
mem is too large for that vCPU; update the logger.warning to report the chosen
candidate_mem (and original mem/vcpu) rather than min_mem. Ensure this change
touches the same symbols: VALID_RESOURCES_MAPPING, valid_mems, vcpu, mem,
WorkflowError, and the logger.warning call.
- Around line 105-109: The broad except block that logs and returns
BATCH_JOB_PLATFORM_CAPABILITIES.EC2.value must not silently downgrade on all
errors; instead, remove the unconditional return in the except (or change it to
re-raise) so errors propagate (fail fast) while retaining existing explicit
empty-response fallbacks elsewhere; update the except handling around the
queue/compute-environment lookup in the method (the block that uses
self.logger.warning and BATCH_JOB_PLATFORM_CAPABILITIES.EC2.value) to either
re-raise the caught exception after logging or catch and rethrow only
non-empty-response-safe errors (e.g., inspect/handle specific AWS client
errors), ensuring only the explicit empty-response branches keep the EC2
fallback.
---
Outside diff comments:
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 179-205: The container_properties assembly (container_properties
dict in batch_job_builder.py used when creating job definitions) is not
Fargate-safe: when platformCapabilities is set to self.platform (FARGATE) you
must include executionRoleArn, remove or set privileged to False, and omit GPU
resourceRequirements; update the logic that builds container_properties and the
call site that sets platformCapabilities to make platform-specific adjustments
(detect self.platform == "FARGATE" and then add executionRoleArn from settings,
delete or set privileged to False, and skip adding
BATCH_JOB_RESOURCE_REQUIREMENT_TYPE.GPU entries) so Fargate job registrations
meet AWS requirements.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: adee3872-6ca3-4367-b682-4abbb205004a
📒 Files selected for processing (3)
snakemake_executor_plugin_aws_batch/batch_client.pysnakemake_executor_plugin_aws_batch/batch_job_builder.pytests/test_batch_job_builder.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (1)
185-221:⚠️ Potential issue | 🔴 CriticalAdd platform-aware container properties for Fargate compatibility.
The job definition construction at line 185 unconditionally sets
privileged: True, omitsexecutionRoleArn, and allows GPU resources, but line 221 can sendplatformCapabilities=["FARGATE"]. This causes Fargate job definition registration to fail: AWS Batch Fargate requires an execution role, prohibits privileged mode, and does not support GPU. The code already validates Fargate vs. EC2 resource requirements but does not handle these container property differences.Conditionally set
executionRoleArnfor Fargate, omit or setprivilegedto false for Fargate (true only for EC2), and reject GPU requests on Fargate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 185 - 221, The container_properties currently set privileged=True, omit executionRoleArn, and may include GPU requirements even when self.platform == "FARGATE"; update the job-definition assembly in batch_job_builder.py so that inside the block building container_properties you (1) only set privileged=True for non-FARGATE platforms and set privileged=False (or omit) when self.platform == self.platform value for FARGATE, (2) add executionRoleArn=self.settings.execution_role (or settings.executionRoleArn) to container_properties when self.platform == FARGATE, and (3) guard appending a GPU resource (BATCH_JOB_RESOURCE_REQUIREMENT_TYPE.GPU) — raise/return an error if gpu > 0 and self.platform == FARGATE to reject unsupported GPU on Fargate; ensure these changes are honored before calling self.batch_client.register_job_definition so register_job_definition receives the correct platform-aware containerProperties and tags.
♻️ Duplicate comments (2)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (2)
129-140:⚠️ Potential issue | 🟠 MajorDon’t reduce Fargate memory below the requested amount.
Line 135 can turn
vcpu=1, mem=5000into2048, under-provisioning the job and causing avoidable OOM failures. Pick the smallest valid memory>= mem, or fail when the request exceeds the maximum for that vCPU.Proposed fix
valid_mems = [m for m, v in VALID_RESOURCES_MAPPING.items() if vcpu in v] if not valid_mems: raise WorkflowError( f"Invalid vCPU value {vcpu} for Fargate. " f"Check valid Fargate resource configurations." ) - min_mem = min(valid_mems) + candidate_mems = [m for m in valid_mems if m >= mem] + if not candidate_mems: + raise WorkflowError( + f"Memory value {mem} MB exceeds the maximum allowed for " + f"vCPU {vcpu} on Fargate." + ) + min_mem = min(candidate_mems) self.logger.warning( f"Memory value {mem} MB is invalid for vCPU {vcpu} on Fargate. " - f"Setting memory to minimum allowed value {min_mem} MB." + f"Setting memory to next allowed value {min_mem} MB." )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 129 - 140, The current logic in batch_job_builder.py (inside the function that maps Fargate vcpu/mem) picks min(valid_mems) which can under-provision requested mem; change it to select the smallest entry in valid_mems that is >= mem (use vcpu, mem, VALID_RESOURCES_MAPPING, valid_mems) and return that; if no valid_mems >= mem exist, raise WorkflowError indicating the requested memory exceeds the maximum allowed for that vCPU rather than silently lowering it; keep the existing logging (self.logger.warning) but update the message to reflect that memory was bumped up to the selected valid value, and ensure returned values remain str(vcpu), str(selected_mem).
111-115:⚠️ Potential issue | 🟠 MajorDon’t silently downgrade platform lookup failures to EC2.
Line 111 still catches every exception from the AWS queue/environment lookup and turns it into an EC2 fallback. That can hide IAM/configuration/transient AWS failures and register the wrong job definition; keep the explicit empty-response fallbacks, but re-raise real lookup errors.
Proposed fix
- except Exception as e: - self.logger.warning( - f"Failed to determine platform from queue: {e}. Defaulting to EC2." - ) - return BATCH_JOB_PLATFORM_CAPABILITIES.EC2.value + except Exception as e: + raise WorkflowError( + f"Failed to determine platform from queue {self.job_queue}: {e}" + ) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 111 - 115, The current broad except in the platform lookup swallows all errors and returns BATCH_JOB_PLATFORM_CAPABILITIES.EC2.value; instead, change the error handling in the platform-lookup routine so you only fall back to EC2 when the lookup explicitly returns an empty/no-result value, and re-raise real AWS/client errors. Concretely: remove the generic "except Exception" around the AWS queue/environment lookup, or narrow it to only catch the expected "no result" condition (e.g., check for None/empty response) and call self.logger.warning(...) then return EC2; for any other exceptions from the AWS client (botocore.exceptions.BotoCoreError/ClientError or unexpected errors), re-raise them so they surface to callers rather than being silenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 237-247: The merged tags builder currently accepts empty keys
(e.g., "=value") and doesn't enforce AWS Batch limits; update the tag assembly
in the method that builds/returns tags (the block using
SNAKEMAKE_AWS_BATCH_JOB_TAGS_ENV_VAR and variable tags) to validate after
merging: trim key and value, skip or reject any entries where the key is empty
(key.strip() == ""), and enforce a maximum of 50 tags (len(tags) <= 50); if
validation fails, raise a clear ValueError describing the problem so the caller
(before submit_job) fails fast with a helpful message.
---
Outside diff comments:
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 185-221: The container_properties currently set privileged=True,
omit executionRoleArn, and may include GPU requirements even when self.platform
== "FARGATE"; update the job-definition assembly in batch_job_builder.py so that
inside the block building container_properties you (1) only set privileged=True
for non-FARGATE platforms and set privileged=False (or omit) when self.platform
== self.platform value for FARGATE, (2) add
executionRoleArn=self.settings.execution_role (or settings.executionRoleArn) to
container_properties when self.platform == FARGATE, and (3) guard appending a
GPU resource (BATCH_JOB_RESOURCE_REQUIREMENT_TYPE.GPU) — raise/return an error
if gpu > 0 and self.platform == FARGATE to reject unsupported GPU on Fargate;
ensure these changes are honored before calling
self.batch_client.register_job_definition so register_job_definition receives
the correct platform-aware containerProperties and tags.
---
Duplicate comments:
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 129-140: The current logic in batch_job_builder.py (inside the
function that maps Fargate vcpu/mem) picks min(valid_mems) which can
under-provision requested mem; change it to select the smallest entry in
valid_mems that is >= mem (use vcpu, mem, VALID_RESOURCES_MAPPING, valid_mems)
and return that; if no valid_mems >= mem exist, raise WorkflowError indicating
the requested memory exceeds the maximum allowed for that vCPU rather than
silently lowering it; keep the existing logging (self.logger.warning) but update
the message to reflect that memory was bumped up to the selected valid value,
and ensure returned values remain str(vcpu), str(selected_mem).
- Around line 111-115: The current broad except in the platform lookup swallows
all errors and returns BATCH_JOB_PLATFORM_CAPABILITIES.EC2.value; instead,
change the error handling in the platform-lookup routine so you only fall back
to EC2 when the lookup explicitly returns an empty/no-result value, and re-raise
real AWS/client errors. Concretely: remove the generic "except Exception" around
the AWS queue/environment lookup, or narrow it to only catch the expected "no
result" condition (e.g., check for None/empty response) and call
self.logger.warning(...) then return EC2; for any other exceptions from the AWS
client (botocore.exceptions.BotoCoreError/ClientError or unexpected errors),
re-raise them so they surface to callers rather than being silenced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2631aa04-9e9f-490a-8474-163c63b38e71
📒 Files selected for processing (1)
snakemake_executor_plugin_aws_batch/batch_job_builder.py
There was a problem hiding this comment.
♻️ Duplicate comments (3)
snakemake_executor_plugin_aws_batch/batch_job_builder.py (3)
248-258:⚠️ Potential issue | 🟡 MinorValidate tag keys and count before returning.
The current implementation accepts empty keys from malformed input like
=valueor,=value,. Additionally, AWS Batch enforces a 50-tag limit per job. Consider validating that keys are non-empty and the total count doesn't exceed 50 to fail fast with a clear error.Proposed fix
env_tags_str = os.environ.get(SNAKEMAKE_AWS_BATCH_JOB_TAGS_ENV_VAR, "") if env_tags_str: for pair in env_tags_str.split(","): pair = pair.strip() if "=" in pair: key, _, value = pair.partition("=") - tags[key.strip()] = value.strip() + key = key.strip() + if not key: + raise WorkflowError( + f"Invalid {SNAKEMAKE_AWS_BATCH_JOB_TAGS_ENV_VAR}: " + "tag key cannot be empty." + ) + tags[key] = value.strip() + + if len(tags) > 50: + raise WorkflowError( + f"AWS Batch jobs support at most 50 tags, got {len(tags)}." + ) return tags🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 248 - 258, Validate and reject empty tag keys and enforce AWS Batch's 50-tag limit before returning tags: after copying self.settings.tags into tags and while parsing env_tags_str (using SNAKEMAKE_AWS_BATCH_JOB_TAGS_ENV_VAR), strip the key and if key is empty raise a clear ValueError (or similar) indicating malformed tag like "=value"; also validate existing keys in self.settings.tags (strip and error on any empty key). After merging all tags, if len(tags) > 50 raise a clear ValueError indicating the tag count exceeds AWS Batch's 50-tag limit; return tags only if all validations pass.
111-115:⚠️ Potential issue | 🟠 MajorBroad exception handling masks real configuration errors.
The catch-all
Exceptionhandler defaults to EC2 for any failure, including permission errors (AccessDeniedException), validation errors, and transient AWS failures. This can turn real configuration issues into silent fallbacks, pushing the actual error to a later register/submit call where it's harder to diagnose.Consider catching specific boto3/botocore exceptions (e.g.,
ClientError) and re-raising unexpected errors to fail fast, while keeping the EC2 fallback only for the explicit empty-response branches.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 111 - 115, Replace the broad except Exception handler around the logic that returns BATCH_JOB_PLATFORM_CAPABILITIES.EC2.value with targeted exception handling: import and catch botocore.exceptions.ClientError (and any specific AWS exceptions you expect, e.g., AccessDeniedException) and handle only the cases where the AWS response is empty or indicates "no platform" by returning the EC2 fallback; for other ClientError codes (and for unexpected exceptions) re-raise the exception so the caller fails fast instead of silently defaulting, and keep the EC2 fallback limited to the explicit empty-response branch in the method that determines platform from the queue.
129-140:⚠️ Potential issue | 🟠 MajorMemory fallback may under-provision the container.
When the requested memory isn't in
VALID_RESOURCES_MAPPING, line 135 picksmin(valid_mems), which could be significantly lower than requested. For example,vcpu=1, mem=5000gets rewritten to2048, potentially causing OOM failures.Consider picking the smallest valid memory that is
>= mem, or raising an error when no valid memory can satisfy the request.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 129 - 140, The current fallback chooses min(valid_mems) which may under-provision (e.g., vcpu=1, mem=5000 -> 2048); update the logic in the block that computes valid_mems (using VALID_RESOURCES_MAPPING) to select the smallest valid memory >= requested mem (e.g., candidate = min(m for m in valid_mems if m >= mem)) and return that, and if no such candidate exists raise a WorkflowError explaining that no valid Fargate memory satisfies the requested mem for the given vcpu; keep the return format str(vcpu), str(selected_mem).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 248-258: Validate and reject empty tag keys and enforce AWS
Batch's 50-tag limit before returning tags: after copying self.settings.tags
into tags and while parsing env_tags_str (using
SNAKEMAKE_AWS_BATCH_JOB_TAGS_ENV_VAR), strip the key and if key is empty raise a
clear ValueError (or similar) indicating malformed tag like "=value"; also
validate existing keys in self.settings.tags (strip and error on any empty key).
After merging all tags, if len(tags) > 50 raise a clear ValueError indicating
the tag count exceeds AWS Batch's 50-tag limit; return tags only if all
validations pass.
- Around line 111-115: Replace the broad except Exception handler around the
logic that returns BATCH_JOB_PLATFORM_CAPABILITIES.EC2.value with targeted
exception handling: import and catch botocore.exceptions.ClientError (and any
specific AWS exceptions you expect, e.g., AccessDeniedException) and handle only
the cases where the AWS response is empty or indicates "no platform" by
returning the EC2 fallback; for other ClientError codes (and for unexpected
exceptions) re-raise the exception so the caller fails fast instead of silently
defaulting, and keep the EC2 fallback limited to the explicit empty-response
branch in the method that determines platform from the queue.
- Around line 129-140: The current fallback chooses min(valid_mems) which may
under-provision (e.g., vcpu=1, mem=5000 -> 2048); update the logic in the block
that computes valid_mems (using VALID_RESOURCES_MAPPING) to select the smallest
valid memory >= requested mem (e.g., candidate = min(m for m in valid_mems if m
>= mem)) and return that, and if no such candidate exists raise a WorkflowError
explaining that no valid Fargate memory satisfies the requested mem for the
given vcpu; keep the return format str(vcpu), str(selected_mem).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 528b17b0-82e7-4cf6-816a-3a455ee9968b
⛔ Files ignored due to path filters (1)
pyproject.tomlis excluded by!pyproject.toml
📒 Files selected for processing (1)
snakemake_executor_plugin_aws_batch/batch_job_builder.py
As described in issue snakemake#34, validation assumed a Fargate compute environment. We can detect the compute environment by looking up the queue, and only run the Fargate specific validation when necessary.
Use self.job.threads for vCPU instead of the undocumented _cores resource. Falls back to _cores for backward compatibility if threads is not set.
Avoid ValueError from min() on empty list when vCPU is not valid for any Fargate memory configuration.
Add tags to the submit_job AWS API call (previously only applied to register_job_definition). Also support reading dynamic tags from the SNAKEMAKE_AWS_BATCH_JOB_TAGS environment variable as comma-separated KEY=VALUE pairs, merged with settings.tags (env var overrides on conflict). This enables per-run cost tracking when a coordinator job sets the env var for child jobs to inherit.
Multi-arch workflows need to route each job to a queue wired to the matching compute environment (e.g. ARM vs x86). Prior to this change all jobs went to the single queue specified by --aws-batch-job-queue regardless of the rule's resources. Read resources.batch_queue on the job first, falling back to the profile-wide default. Platform detection and submission both use the resolved per-job queue.
snakemake core's auto-deploy injects 'pip install snakemake-storage-plugin-s3' into every worker's startup command with no version constraint. As of snakemake-storage-plugin-s3==0.3.3 the PyPI metadata requires snakemake-interface-storage-plugins >= 4.2.3, which needs snakemake >= 9. That breaks workers running snakemake 8.x. Set auto_deploy_default_storage_provider=False so workers rely on the storage plugin being pre-installed in the container image at a pinned, compatible version. Image maintainers are now responsible for pinning a compatible snakemake-storage-plugin-s3.
s3-plugin 0.3.x declares snakemake-interface-storage-plugins>=4.2.3, but the plugin keeps the pre-4.2.3 super() signature and so breaks with snakemake 8.x (which cannot pass the new logger argument). The executor plugin never calls into s3-plugin APIs directly — it only needs some storage provider to be installed. Widening the pin to ">=0.2,<0.4" lets consumers fall back to 0.2.x (which keeps the <4.0 interface and works with snakemake 8.x) until upstream ships a 0.3.x s3 build that matches the current base __init__ signature.
…ySize POSIX shm_open writes to /dev/shm, which on EC2/ECS containers defaults to 64 MB. Tools that stage large in-memory indexes (bwa-mem2 shm, etc.) need a larger /dev/shm. Rules opt in via resources.shared_memory_size_mb. Fargate is excluded — it does not honor linuxParameters.sharedMemorySize.
c736516 to
31b0253
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py`:
- Around line 118-120: The error messages in batch_job_builder.py reference
self.settings.job_queue but the actual queue used for a job can be overridden
and stored in self.job_queue; update the two places (around the
platform-determination and the second error block) to use self.job_queue instead
of self.settings.job_queue so diagnostics show the resolved per-job queue and
not the default settings value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 822c07d0-b3f7-4862-864a-9c0d2861a654
⛔ Files ignored due to path filters (1)
pyproject.tomlis excluded by!pyproject.toml
📒 Files selected for processing (4)
snakemake_executor_plugin_aws_batch/__init__.pysnakemake_executor_plugin_aws_batch/batch_client.pysnakemake_executor_plugin_aws_batch/batch_job_builder.pytests/test_batch_job_builder.py
🚧 Files skipped from review as they are similar to previous changes (2)
- snakemake_executor_plugin_aws_batch/batch_client.py
- tests/test_batch_job_builder.py
| f"Failed to determine platform from queue " | ||
| f"{self.settings.job_queue}: {e}" | ||
| ) from e |
There was a problem hiding this comment.
Use the resolved per-job queue in error messages.
Line 119 and Line 190 use self.settings.job_queue, which can misreport the queue when resources.batch_queue overrides it. Use self.job_queue for accurate diagnostics.
💡 Suggested patch
- f"{self.settings.job_queue}: {e}"
+ f"{self.job_queue}: {e}"
@@
- f"(queue {self.settings.job_queue} resolves to FARGATE). "
+ f"(queue {self.job_queue} resolves to FARGATE). "Also applies to: 188-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@snakemake_executor_plugin_aws_batch/batch_job_builder.py` around lines 118 -
120, The error messages in batch_job_builder.py reference
self.settings.job_queue but the actual queue used for a job can be overridden
and stored in self.job_queue; update the two places (around the
platform-determination and the second error block) to use self.job_queue instead
of self.settings.job_queue so diagnostics show the resolved per-job queue and
not the default settings value.
Summary
Hardens the AWS Batch executor for multi-arch, multi-tenant production use. Three user-facing capabilities plus underlying validation/compat fixes.
What's new
resources.batch_queue— route each rule to a Batch queue wired to the matching compute environment. Enables a single workflow to run ARM rules on Graviton queues and x86 rules on Intel/AMD queues. Falls back to the profile default queue.settings.tagsare now sent tosubmit_job(not onlyregister_job_definition), so the AWS Batch console and Cost Explorer can group jobs by tag. ASNAKEMAKE_AWS_BATCH_JOB_TAGSenv var carries dynamicKEY=VALUEpairs for coordinator → worker tag inheritance (e.g. per-run commit hashes for cost attribution). Env tags override settings on key conflict.resources.shared_memory_size_mb— rules opt into a larger/dev/shm(EC2/ECS default 64 MB is too small for tools that stage large in-memory indexes). EC2 only; Fargate does not honorlinuxParameters.sharedMemorySize.Validation and correctness
VALID_RESOURCES_MAPPINGis restrictive, and only applicable to Fargate compute environments. #34): resolves the queue's compute environment and runs platform-specific vCPU/memory checks. Previously all jobs were validated against Fargate's strict resource matrix, which blocked legitimate EC2 requests. Real AWS lookup errors (botocore.ClientError) now propagate asWorkflowErrorinstead of silently falling back to EC2.threads— derived fromself.job.threads(the documented Snakemake directive) instead of the undocumented_coresresource, with_coresretained as fallback.>= meminstead of silently shrinking tomin(valid_mems)and OOMing the job. Raise if no valid value exists.submit_job, matching AWS Batch limits.Known limitation
Fargate job-definition registration is not yet supported. Fargate requires
executionRoleArn, prohibitsprivileged: True, and does not allow GPU resources — none of which this plugin currently emits conditionally.build_job_definitionraises a clearWorkflowErrorwhen a Fargate queue is detected, replacing an opaque AWS-side rejection with a local message. EC2 queues are unaffected.Compatibility
auto_deploy_default_storage_provider=Falseso workers don'tpip installan unpinnedsnakemake-storage-plugin-s3at startup — s3-plugin 0.3.x metadata requires snakemake 9.x and breaks 8.x workers. Image maintainers pre-install a compatible plugin version.snakemake-storage-plugin-s3pin widened to>=0.2,<0.4so 8.x consumers can fall back to 0.2.x.Dependencies
Stacked on #38 (
fix-vcpu-threads-and-validation) — the first three commits here are the same as #38's, so #38 should land first (or be closed in favor of this PR).Test plan
tests/test_batch_job_builder.py:submit_job(settings, env var, merge, override, empty cases)ClientErrorre-raises, empty-response fallback, Fargate detection)build_job_definitionSummary by CodeRabbit
New Features
Behavior Changes