Allow pushing user-allocation membership to Keycloak#249
Allow pushing user-allocation membership to Keycloak#249QuanMPhm wants to merge 1 commit intonerc-project:mainfrom
Conversation
src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py
Outdated
Show resolved
Hide resolved
9a53156 to
d7da5c4
Compare
|
@knikolla Two more questions:
|
src/coldfront_plugin_cloud/tests/functional/openshift/test_allocation.py
Outdated
Show resolved
Hide resolved
d7da5c4 to
3b80589
Compare
|
@knikolla I've addressed your comments except one. Also, do you have responses to these questions? |
3b80589 to
cb1d628
Compare
|
@QuanMPhm please resolve conflicts. Are there any questions that I missed answering? |
a217f31 to
0358cb7
Compare
0358cb7 to
8fc4ea6
Compare
knikolla
left a comment
There was a problem hiding this comment.
Did a quick first pass and provided some comments.
Also this needs to be possible configurable via a setting.
|
@QuanMPhm Actually, another thought, do you think it would make sense to implement this in the Keycloak plugin? https://github.com/nerc-project/coldfront-plugin-keycloak It could listen to signals in the same way that the cloud plugin listens to signals. It already has a keycloak client implemented. And there is nothing in pushing users to a Keycloak group that is specific to either OpenShift or OpenStack. |
|
@knikolla I see that it does make sense to seperate the Keycloak functionality from the rest of the plugin. It makes sense to me. I forgot that repo existed. There would need to be some overhaul to add integration and unit tests to |
For now let's keep it here (as not to frontload the work) and we can easily split it out later if needed. Perhaps try implementing it here via signals so as to keep it loosely coupled so that if we need to split it later it doesn't require a lot of uncoupling. |
|
@knikolla @larsks I have a question about the Coldfront Keycloak PR. I know we previously decided to represent user-membership to an allocation by adding them to a Keycloak group with the same name as the |
See my comment here #249 (comment) |
96e371c to
97d7220
Compare
knikolla
left a comment
There was a problem hiding this comment.
Quick first pass for direction feedback.
97d7220 to
151aea2
Compare
knikolla
left a comment
There was a problem hiding this comment.
Looking much better. Almost there.
A follow-up to this (after we merge the validate_allocations refactor) should investigate how to cleanly implement this in validate_allocations too.
| CloudResourceAttribute(name=RESOURCE_PROJECT_DOMAIN), | ||
| CloudResourceAttribute(name=RESOURCE_ROLE), | ||
| CloudResourceAttribute(name=RESOURCE_QUOTA_RESOURCES), | ||
| CloudResourceAttribute(name=RESOURCE_KEYSTONE_GROUP_TEMPLATE), |
Actually... I would make it a whole separate CLI command. |
151aea2 to
d3087ea
Compare
|
@knikolla I've allowed the group name template string to accept any allocation attribute. Further documentation is in the docstring for the function |
|
@naved001 would appreciate a pass from you. |
knikolla
left a comment
There was a problem hiding this comment.
Had some more time to think about how to expose enabling/disable the feature.
src/coldfront_plugin_cloud/tasks.py
Outdated
| attributes.RESOURCE_KEYCLOAK_GROUP_TEMPLATE | ||
| ) | ||
| if group_name_template is None: | ||
| logger.warning( |
There was a problem hiding this comment.
I had some more time to think about this. Instead of logging a warning, log an info message here, this way the presence of the Keycloak Group Template Resource Attribute can act as a flag for enabling the feature for each specific Resource individually.
Therefore also switch the order so that the group template is parsed before you look up the user.
Keycloak enabled but no group name template specified for resource <>. Skipping addition to Keycloak group.
| attributes.RESOURCE_KEYCLOAK_GROUP_TEMPLATE | ||
| ) | ||
| if group_name_template is None: | ||
| logger.warning( |
There was a problem hiding this comment.
Same comment as from addition.
| self.base_url = os.getenv("KEYCLOAK_BASE_URL") | ||
| self.realm = os.getenv("KEYCLOAK_REALM") | ||
| self.client_id = os.getenv("KEYCLOAK_CLIENT_ID", "coldfront") | ||
| self.client_secret = os.getenv("KEYCLOAK_CLIENT_SECRET", "nomoresecret") |
There was a problem hiding this comment.
I'd rather we don't set the default id and secret.
| "client_id": self.client_id, | ||
| "client_secret": self.client_secret, | ||
| } | ||
| r = requests.post(self.token_url, data=params).json() |
There was a problem hiding this comment.
maybe some error handling would be useful before you try to access r["access_token"].
There was a problem hiding this comment.
used raise_for_status()
| url = f"{self.base_url}/admin/realms/{self.realm}/users/{user_id}/groups" | ||
| r = self.api_client.get(url) | ||
| r.raise_for_status() | ||
| return [group["name"] for group in r.json()] |
There was a problem hiding this comment.
this may be my limited knowledge of keycloak, but are we guaranteed to have a user be part of some group?
There was a problem hiding this comment.
I can't say what the production environment will look like, but local testing doesn't suggest users are automatically added to some group. Regardless, I don't think that should be a problem for consumers of the Keycloak API? @knikolla ?
There was a problem hiding this comment.
It is likely that the user may be part of other groups besides the ones corresponding to respective allocations. This is at least guaranteed for PIs who are part of a group called pi.
| Acceptable variables for the group name template string is: | ||
| - $resource_name: the name of the resource (e.g. "OpenShift") | ||
| - Any allocation attribute defined for the allocation, with spaces replaced by underscores and | ||
| all lowercase (e.g. for `Project Name`, the variable would be `$project_name`) |
There was a problem hiding this comment.
why is there a $ before project_name?
There was a problem hiding this comment.
That is the syntax for Python string templates, $ marks the start of placeholders that will be substituted
| def create_group(self, group_name): | ||
| url = f"{self.base_url}/admin/realms/{self.realm}/groups" | ||
| payload = {"name": group_name} | ||
| response = self.api_client.post(url, json=payload) |
There was a problem hiding this comment.
does this return a group id when a group is created? If yes, we should return that to the caller and we won't need an additional call to get_group_id
There was a problem hiding this comment.
From local testing and the online documentation, it just returns the status code
| } | ||
| session = requests.session() | ||
| session.headers.update(headers) | ||
| return session |
There was a problem hiding this comment.
We are creating and caching this session with a token (and cache the instantiation of the class again later). When does the keycloak token expire? Just wondering how long we can continue to use this. Or is that not a concern since we'll run this thing as needed and then when its needed we start from scratch . I admit I don't know how coldfront tasks call these.
There was a problem hiding this comment.
@naved001 The keycloak API will be called whenever Coldfront's remove/activate user signals are triggered, which happens when a user is add/removed from an allocation. That being said, I would assume the api_client will be called somewhat often, especially during batch user uploads.
@naved001 @knikolla Do we still want to cache the api client? Or make a follow-up PR to set a TTL mechanism?
There was a problem hiding this comment.
Just remove the caching for now.
d3087ea to
4489473
Compare
There was a problem hiding this comment.
Pull request overview
Adds Keycloak integration to push ColdFront allocation membership into Keycloak group membership, driven by a resource-level group-name template and exercised via new functional tests/CI workflow.
Changes:
- Add Keycloak client + tasks to add/remove allocation users to/from Keycloak groups based on a configurable template.
- Wire Keycloak add/remove behavior into allocation-user signals when Keycloak is enabled via environment.
- Introduce Keycloak functional tests and CI scripts/workflow to run them.
Reviewed changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coldfront_plugin_cloud/tests/functional/keycloak/test_keycloak.py | Adds functional tests validating Keycloak user/group membership behavior for allocation changes. |
| src/coldfront_plugin_cloud/tests/functional/keycloak/init.py | Introduces package for Keycloak functional tests. |
| src/coldfront_plugin_cloud/tasks.py | Adds Keycloak client caching and add/remove group membership tasks; adds group-name templating helper. |
| src/coldfront_plugin_cloud/signals.py | Hooks Keycloak add/remove tasks into allocation user add/remove signals behind an env flag. |
| src/coldfront_plugin_cloud/management/commands/validate_allocations.py | Tweaks logging/comment wording related to user sync validation. |
| src/coldfront_plugin_cloud/kc_client.py | Adds a Keycloak admin API client for groups/users operations. |
| src/coldfront_plugin_cloud/attributes.py | Adds resource attribute for Keycloak group name template. |
| requirements.txt | Adds requests dependency for the Keycloak client. |
| ci/setup-keycloak.sh | Adds Keycloak container bootstrap script for CI/local functional tests. |
| ci/run_functional_tests_keycloak.sh | Adds a runner script to execute Keycloak functional tests with required env vars. |
| .github/workflows/test-functional-keycloak.yaml | Adds a GitHub Actions workflow to run Keycloak functional tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _get_keycloak_group_name(allocation: Allocation, template_string: str) -> str: | ||
| """ | ||
| Acceptable variables for the group name template string is: | ||
| - $resource_name: the name of the resource (e.g. "OpenShift") | ||
| - Any allocation attribute defined for the allocation, with spaces replaced by underscores and | ||
| all lowercase (e.g. for `Project Name`, the variable would be `$project_name`) | ||
| """ | ||
| resource_name = allocation.resources.first().name | ||
| allocation_attrs_list: list[AllocationAttribute] = ( | ||
| allocation.allocationattribute_set.all() | ||
| ) | ||
|
|
||
| template_sub_dict = {"resource_name": resource_name} | ||
| for attr in allocation_attrs_list: | ||
| template_sub_dict[ | ||
| _clean_template_string(attr.allocation_attribute_type.name) | ||
| ] = attr.value | ||
|
|
||
| return Template(template_string).substitute(**template_sub_dict) |
There was a problem hiding this comment.
_get_keycloak_group_name() uses string.Template.substitute(); if the configured template references a variable that isn't present in template_sub_dict (e.g., a missing allocation attribute), this will raise a KeyError and break the signal/task. Consider using safe_substitute or catching KeyError to log and skip rather than raising.
There was a problem hiding this comment.
The task should break rather than do the wrong thing.
|
|
||
| group_name = _get_keycloak_group_name(allocation, group_name_template) | ||
| kc_admin_client.create_group(group_name) | ||
| group_id = kc_admin_client.get_group_id(group_name) |
There was a problem hiding this comment.
After create_group(), group_id can still be None (e.g., group creation failure or search not returning the new group). add_user_to_group() is then called with group_id=None, producing an invalid URL and a hard failure. Please check group_id for None and handle/log/return before attempting to add the user.
| group_id = kc_admin_client.get_group_id(group_name) | |
| group_id = kc_admin_client.get_group_id(group_name) | |
| if group_id is None: | |
| logger.warning( | |
| f"Keycloak group '{group_name}' could not be resolved after creation; cannot add user {username} to group." | |
| ) | |
| return |
src/coldfront_plugin_cloud/tasks.py
Outdated
| logger.warning(f"User {username} not found in Keycloak, cannot add to group.") | ||
| return | ||
|
|
||
| group_name = _get_keycloak_group_name(allocation, group_name_template) | ||
| group_id = kc_admin_client.get_group_id(group_name) |
There was a problem hiding this comment.
The warning message in the user-not-found branch says "cannot add to group" even though this is the removal path, which is misleading for operators. Also, group_id can be None (group missing) and remove_user_from_group() will be called with None, yielding an invalid URL; please guard against a missing group_id and log/return cleanly.
| logger.warning(f"User {username} not found in Keycloak, cannot add to group.") | |
| return | |
| group_name = _get_keycloak_group_name(allocation, group_name_template) | |
| group_id = kc_admin_client.get_group_id(group_name) | |
| logger.warning( | |
| f"User {username} not found in Keycloak, cannot remove from group." | |
| ) | |
| return | |
| group_name = _get_keycloak_group_name(allocation, group_name_template) | |
| group_id = kc_admin_client.get_group_id(group_name) | |
| if group_id is None: | |
| logger.warning( | |
| f"Group {group_name} not found in Keycloak, skipping removal for user {username}." | |
| ) | |
| return |
There was a problem hiding this comment.
Since we already have a lot of allocations and users of those, the validation command is still high priority otherwise only the new group memberships will be recorded and none of the previous ones.
|
|
||
|
|
||
| def is_keycloak_enabled(): | ||
| return os.getenv("KEYCLOAK_BASE_URL") |
There was a problem hiding this comment.
is_keycloak_enabled() only checks KEYCLOAK_BASE_URL, but KeyCloakAPIClient also requires KEYCLOAK_REALM/CLIENT_ID/CLIENT_SECRET. If BASE_URL is set but the other vars are missing/misconfigured, the signal will still enqueue/call Keycloak tasks which then fail at runtime. Consider validating the full Keycloak config (or catching client init/token errors) before enabling these tasks.
| return os.getenv("KEYCLOAK_BASE_URL") | |
| required_env_vars = ( | |
| "KEYCLOAK_BASE_URL", | |
| "KEYCLOAK_REALM", | |
| "KEYCLOAK_CLIENT_ID", | |
| "KEYCLOAK_CLIENT_SECRET", | |
| ) | |
| return all(os.getenv(env_var) for env_var in required_env_vars) |
There was a problem hiding this comment.
I think one env var is enough to indicate Keycloak is enabled. It's the operator's problem if the config is misconfigured
| def get_group_id(self, group_name) -> str | None: | ||
| """Return None if group not found""" | ||
| query = f"search={group_name}&exact=true" | ||
| url = f"{self.base_url}/admin/realms/{self.realm}/groups?{query}" | ||
| r = self.api_client.get(url).json() | ||
| return r[0]["id"] if r else None | ||
|
|
||
| def get_user_id(self, cf_username) -> str | None: | ||
| """Return None if user not found""" | ||
| # (Quan) Coldfront usernames map to Keycloak usernames | ||
| # https://github.com/nerc-project/coldfront-plugin-cloud/pull/249#discussion_r2953393852 | ||
| query = f"username={cf_username}&exact=true" | ||
| url = f"{self.base_url}/admin/realms/{self.realm}/users?{query}" | ||
| r = self.api_client.get(url).json() | ||
| return r[0]["id"] if r else None |
There was a problem hiding this comment.
get_group_id() / get_user_id() call self.api_client.get(url).json() without checking the HTTP status. If Keycloak returns a non-2xx (e.g., 401/403/500), this can lead to confusing downstream behavior (or JSON parsing errors) rather than a clear failure. Call raise_for_status() (or otherwise handle non-2xx) before consuming the response.
| @property | ||
| def api_client(self): | ||
| params = { | ||
| "grant_type": "client_credentials", | ||
| "client_id": self.client_id, | ||
| "client_secret": self.client_secret, | ||
| } | ||
| r = requests.post(self.token_url, data=params) | ||
| r.raise_for_status() | ||
| headers = { | ||
| "Authorization": ("Bearer %s" % r.json()["access_token"]), | ||
| "Content-Type": "application/json", | ||
| } | ||
| session = requests.session() | ||
| session.headers.update(headers) | ||
| return session | ||
|
|
There was a problem hiding this comment.
api_client is a property that fetches a new access token and creates a new requests.Session on every access. Since each client method accesses api_client, a single add/remove operation can trigger multiple token requests, which is unnecessarily slow and can cause rate-limiting/flakiness. Consider caching the session/token within the KeyCloakAPIClient instance (refreshing only when expired) or otherwise reusing a single session per task run.
|
|
||
| set -xe | ||
|
|
||
| sudo docker rm -f keycloak |
There was a problem hiding this comment.
With set -e, docker rm -f keycloak will cause the script to exit if the container doesn't exist (common on first run). Add || true (or similar) so setup remains idempotent.
| sudo docker rm -f keycloak | |
| sudo docker rm -f keycloak || true |
| quay.io/keycloak/keycloak:25.0 start-dev | ||
|
|
||
| # wait for keycloak to be ready | ||
| until curl -s http://localhost:8080/auth/realms/master; do |
There was a problem hiding this comment.
The readiness loop uses /auth/realms/master and curl -s, which will exit successfully even on HTTP errors (e.g., 404) and may not actually wait for Keycloak to be ready. Use the correct base path for the chosen Keycloak image (often /realms/master for recent Keycloak) and add -f (or check the status code) so the loop retries until a successful response.
| until curl -s http://localhost:8080/auth/realms/master; do | |
| until curl -fsS http://localhost:8080/realms/master >/dev/null; do |
| self.assertIsNotNone(user_id) | ||
|
|
||
| # Check that the user is in the project group | ||
| # Group name determined by the RESOURCE_KEYCLOAK_GROUP_TEMPLATE attribute, set to "$resource_name/$project_name" in tests |
There was a problem hiding this comment.
The comment says the template is set to "$resource_name/$project_name", but setUpTestData() actually configures "$resource_name/$allocated_project_id". Please update the comment to match the test setup to avoid confusion when debugging failures.
| # Group name determined by the RESOURCE_KEYCLOAK_GROUP_TEMPLATE attribute, set to "$resource_name/$project_name" in tests | |
| # Group name determined by the RESOURCE_KEYCLOAK_GROUP_TEMPLATE attribute, set to "$resource_name/$allocated_project_id" in tests |
| def test_user_added_without_keycloak_group_template(self): | ||
| """Test that when the Keycloak group template attribute is not present on the resource, the user is not added to Keycloak and a log message is captured.""" | ||
| # Create a resource without the Keycloak group template attribute |
There was a problem hiding this comment.
This test docstring says the user is "not added to Keycloak", but in the test setup the user is already created in Keycloak via new_user(). What’s being skipped here is adding the user to a Keycloak group due to the missing template, so the docstring (and the nearby "warning" wording) should be updated to reflect the actual behavior being asserted.
| def add_user_to_group(self, user_id, group_id): | ||
| url = f"{self.base_url}/admin/realms/{self.realm}/users/{user_id}/groups/{group_id}" | ||
| r = self.api_client.put(url) | ||
| r.raise_for_status() |
There was a problem hiding this comment.
If the user is already part of the group don't error out here, just log an info.
There was a problem hiding this comment.
From local testing, if the user is already part of the group, 204 is returned, so not error is raised.
| def remove_user_from_group(self, user_id, group_id): | ||
| url = f"{self.base_url}/admin/realms/{self.realm}/users/{user_id}/groups/{group_id}" | ||
| r = self.api_client.delete(url) | ||
| r.raise_for_status() |
There was a problem hiding this comment.
If the user was not part of the group, don't error out here, just log an info.
There was a problem hiding this comment.
From local testing, it seems if the user is already not part of the group, 204 is returned, so not error is raised.
4489473 to
51df802
Compare
A Keycloak admin client has been added When `activate_allocation` is called, the user is added to a Keycloak group named using a format string defined in the allocation's resource attribute "Format String for Keystone Group Names" If the user does not already exist in Keycloak, the case is ignored for now Keycloak integration is optional, toggled by setting the env var "KEYCLOAK_BASE_URL" Authentication to Keycloak is done via client credentials grant When `deactivate_allocation` is called, the user is removed from the Keycloak group New functional test added for Keycloak integration A comment in `validate_allocations` has been updated to reflect the more restrictive validation behavior, where users on cluster projects will be removed if they are not part of the Coldfront allocation (rather than if they are not registered on Coldfront at all).
51df802 to
6cb51d7
Compare
Closes nerc-project/operations#948. More details in the commit message
There are still some questions I have below, so this is still a draft for now.