Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/test-functional-keycloak.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: test-functional-keycloak

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-22.04

steps:
- uses: actions/checkout@v6

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: 3.12

- name: Upgrade and install packages
run: |
bash ./ci/setup-ubuntu.sh

- name: Install Keycloak
run: |
bash ./ci/setup-keycloak.sh

- name: Install ColdFront and plugin
run: |
./ci/setup.sh

- name: Run functional tests
run: |
./ci/run_functional_tests_keycloak.sh
17 changes: 17 additions & 0 deletions ci/run_functional_tests_keycloak.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/bash
set -xe

if [[ ! "${CI}" == "true" ]]; then
source /tmp/coldfront_venv/bin/activate
fi

export DJANGO_SETTINGS_MODULE="local_settings"
export PYTHONWARNINGS="ignore:Unverified HTTPS request"

export KEYCLOAK_BASE_URL="http://localhost:8080"
export KEYCLOAK_REALM="master"
export KEYCLOAK_CLIENT_ID="coldfront"
export KEYCLOAK_CLIENT_SECRET="nomoresecret"

coverage run --source="." -m django test coldfront_plugin_cloud.tests.functional.keycloak
coverage report
61 changes: 61 additions & 0 deletions ci/setup-keycloak.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/bin/bash

set -xe

sudo docker rm -f keycloak
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With set -e, docker rm -f keycloak will cause the script to exit if the container doesn't exist (common on first run). Add || true (or similar) so setup remains idempotent.

Suggested change
sudo docker rm -f keycloak
sudo docker rm -f keycloak || true

Copilot uses AI. Check for mistakes.

sudo docker run -d --name keycloak \
-e KEYCLOAK_ADMIN=admin \
-e KEYCLOAK_ADMIN_PASSWORD=nomoresecret \
-p 8080:8080 \
-p 8443:8443 \
quay.io/keycloak/keycloak:25.0 start-dev

# wait for keycloak to be ready
until curl -s http://localhost:8080/auth/realms/master; do
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readiness loop uses /auth/realms/master and curl -s, which will exit successfully even on HTTP errors (e.g., 404) and may not actually wait for Keycloak to be ready. Use the correct base path for the chosen Keycloak image (often /realms/master for recent Keycloak) and add -f (or check the status code) so the loop retries until a successful response.

Suggested change
until curl -s http://localhost:8080/auth/realms/master; do
until curl -fsS http://localhost:8080/realms/master >/dev/null; do

Copilot uses AI. Check for mistakes.
echo "Waiting for Keycloak to be ready..."
sleep 5
done

# Create client and add admin role to client's service account
ACCESS_TOKEN=$(curl -X POST "http://localhost:8080/realms/master/protocol/openid-connect/token" \
-d "username=admin" \
-d "password=nomoresecret" \
-d "grant_type=password" \
-d "client_id=admin-cli" \
-d "scope=openid" \
| jq -r '.access_token')


curl -X POST "http://localhost:8080/admin/realms/master/clients" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"clientId": "coldfront",
"secret": "nomoresecret",
"redirectUris": ["http://localhost:8080/*"],
"serviceAccountsEnabled": true
}'

COLDFRONT_CLIENT_ID=$(curl -X GET "http://localhost:8080/admin/realms/master/clients?clientId=coldfront" \
-H "Authorization: Bearer $ACCESS_TOKEN" | jq -r '.[0].id')


COLDFRONT_SERVICE_ACCOUNT_ID=$(curl -X GET "http://localhost:8080/admin/realms/master/clients/$COLDFRONT_CLIENT_ID/service-account-user" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
| jq -r '.id')

ADMIN_ROLE_ID=$(curl -X GET "http://localhost:8080/admin/realms/master/roles/admin" \
-H "Authorization: Bearer $ACCESS_TOKEN" | jq -r '.id')

# Add admin role to the service account user
curl -X POST "http://localhost:8080/admin/realms/master/users/$COLDFRONT_SERVICE_ACCOUNT_ID/role-mappings/realm" \
-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '[
{
"id": "'$ADMIN_ROLE_ID'",
"name": "admin"
}
]'
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ python-novaclient
python-neutronclient
python-swiftclient
pytz
requests
4 changes: 3 additions & 1 deletion src/coldfront_plugin_cloud/attributes.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ class CloudAllocationAttribute:
RESOURCE_API_URL = "OpenShift API Endpoint URL"
RESOURCE_IDENTITY_NAME = "OpenShift Identity Provider Name"
RESOURCE_ROLE = "Role for User in Project"
RESOURCE_QUOTA_RESOURCES = "Available Quota Resources"

RESOURCE_FEDERATION_PROTOCOL = "OpenStack Federation Protocol"
RESOURCE_IDP = "OpenStack Identity Provider"
Expand All @@ -35,6 +34,8 @@ class CloudAllocationAttribute:

RESOURCE_EULA_URL = "EULA URL"
RESOURCE_CLUSTER_NAME = "Internal Cluster Name"
RESOURCE_QUOTA_RESOURCES = "Available Quota Resources"
RESOURCE_KEYCLOAK_GROUP_TEMPLATE = "Template String for Keycloak Group Names"

RESOURCE_ATTRIBUTES = [
CloudResourceAttribute(name=RESOURCE_AUTH_URL),
Expand All @@ -45,6 +46,7 @@ class CloudAllocationAttribute:
CloudResourceAttribute(name=RESOURCE_PROJECT_DOMAIN),
CloudResourceAttribute(name=RESOURCE_ROLE),
CloudResourceAttribute(name=RESOURCE_QUOTA_RESOURCES),
CloudResourceAttribute(name=RESOURCE_KEYCLOAK_GROUP_TEMPLATE),
CloudResourceAttribute(name=RESOURCE_USER_DOMAIN),
CloudResourceAttribute(name=RESOURCE_EULA_URL),
CloudResourceAttribute(name=RESOURCE_DEFAULT_PUBLIC_NETWORK),
Expand Down
77 changes: 77 additions & 0 deletions src/coldfront_plugin_cloud/kc_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import os

import requests


class KeyCloakAPIClient:
def __init__(self):
self.base_url = os.getenv("KEYCLOAK_BASE_URL")
self.realm = os.getenv("KEYCLOAK_REALM")
self.client_id = os.getenv("KEYCLOAK_CLIENT_ID")
self.client_secret = os.getenv("KEYCLOAK_CLIENT_SECRET")

self.token_url = (
f"{self.base_url}/realms/{self.realm}/protocol/openid-connect/token"
)

@property
def api_client(self):
params = {
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret,
}
r = requests.post(self.token_url, data=params)
r.raise_for_status()
headers = {
"Authorization": ("Bearer %s" % r.json()["access_token"]),
"Content-Type": "application/json",
}
session = requests.session()
session.headers.update(headers)
return session
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are creating and caching this session with a token (and cache the instantiation of the class again later). When does the keycloak token expire? Just wondering how long we can continue to use this. Or is that not a concern since we'll run this thing as needed and then when its needed we start from scratch . I admit I don't know how coldfront tasks call these.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naved001 The keycloak API will be called whenever Coldfront's remove/activate user signals are triggered, which happens when a user is add/removed from an allocation. That being said, I would assume the api_client will be called somewhat often, especially during batch user uploads.

@naved001 @knikolla Do we still want to cache the api client? Or make a follow-up PR to set a TTL mechanism?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the caching for now.


Comment on lines +17 to +33
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api_client is a property that fetches a new access token and creates a new requests.Session on every access. Since each client method accesses api_client, a single add/remove operation can trigger multiple token requests, which is unnecessarily slow and can cause rate-limiting/flakiness. Consider caching the session/token within the KeyCloakAPIClient instance (refreshing only when expired) or otherwise reusing a single session per task run.

Copilot uses AI. Check for mistakes.
def create_group(self, group_name):
url = f"{self.base_url}/admin/realms/{self.realm}/groups"
payload = {"name": group_name}
response = self.api_client.post(url, json=payload)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this return a group id when a group is created? If yes, we should return that to the caller and we won't need an additional call to get_group_id

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From local testing and the online documentation, it just returns the status code


# If group already exists, ignore and move on
if response.status_code not in (201, 409):
response.raise_for_status()

def get_group_id(self, group_name) -> str | None:
"""Return None if group not found"""
query = f"search={group_name}&exact=true"
url = f"{self.base_url}/admin/realms/{self.realm}/groups?{query}"
r = self.api_client.get(url)
r.raise_for_status()
r_json = r.json()
return r_json[0]["id"] if r_json else None

def get_user_id(self, cf_username) -> str | None:
"""Return None if user not found"""
# (Quan) Coldfront usernames map to Keycloak usernames
# https://github.com/nerc-project/coldfront-plugin-cloud/pull/249#discussion_r2953393852
query = f"username={cf_username}&exact=true"
url = f"{self.base_url}/admin/realms/{self.realm}/users?{query}"
r = self.api_client.get(url)
r.raise_for_status()
r_json = r.json()
return r_json[0]["id"] if r_json else None

def add_user_to_group(self, user_id, group_id):
url = f"{self.base_url}/admin/realms/{self.realm}/users/{user_id}/groups/{group_id}"
r = self.api_client.put(url)
r.raise_for_status()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user is already part of the group don't error out here, just log an info.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From local testing, if the user is already part of the group, 204 is returned, so not error is raised.


def remove_user_from_group(self, user_id, group_id):
url = f"{self.base_url}/admin/realms/{self.realm}/users/{user_id}/groups/{group_id}"
r = self.api_client.delete(url)
r.raise_for_status()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user was not part of the group, don't error out here, just log an info.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From local testing, it seems if the user is already not part of the group, 204 is returned, so not error is raised.


def get_user_groups(self, user_id) -> list[str]:
url = f"{self.base_url}/admin/realms/{self.realm}/users/{user_id}/groups"
r = self.api_client.get(url)
r.raise_for_status()
return [group["name"] for group in r.json()]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be my limited knowledge of keycloak, but are we guaranteed to have a user be part of some group?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say what the production environment will look like, but local testing doesn't suggest users are automatically added to some group. Regardless, I don't think that should be a problem for consumers of the Keycloak API? @knikolla ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely that the user may be part of other groups besides the ones corresponding to respective allocations. This is at least guaranteed for PIs who are part of a group called pi.

13 changes: 13 additions & 0 deletions src/coldfront_plugin_cloud/signals.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@
from coldfront_plugin_cloud.tasks import (
activate_allocation,
add_user_to_allocation,
add_user_to_keycloak,
disable_allocation,
remove_user_from_allocation,
remove_user_from_keycloak,
)
from coldfront.core.allocation.signals import (
allocation_activate,
Expand All @@ -25,6 +27,10 @@ def is_async():
return os.getenv("REDIS_HOST")


def is_keycloak_enabled():
return os.getenv("KEYCLOAK_BASE_URL")
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_keycloak_enabled() only checks KEYCLOAK_BASE_URL, but KeyCloakAPIClient also requires KEYCLOAK_REALM/CLIENT_ID/CLIENT_SECRET. If BASE_URL is set but the other vars are missing/misconfigured, the signal will still enqueue/call Keycloak tasks which then fail at runtime. Consider validating the full Keycloak config (or catching client init/token errors) before enabling these tasks.

Suggested change
return os.getenv("KEYCLOAK_BASE_URL")
required_env_vars = (
"KEYCLOAK_BASE_URL",
"KEYCLOAK_REALM",
"KEYCLOAK_CLIENT_ID",
"KEYCLOAK_CLIENT_SECRET",
)
return all(os.getenv(env_var) for env_var in required_env_vars)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

@QuanMPhm QuanMPhm Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one env var is enough to indicate Keycloak is enabled. It's the operator's problem if the config is misconfigured



@receiver(allocation_activate)
@receiver(allocation_change_approved)
def activate_allocation_receiver(sender, **kwargs):
Expand All @@ -48,11 +54,18 @@ def activate_allocation_user_receiver(sender, **kwargs):
allocation_user_pk = kwargs.get("allocation_user_pk")
if is_async():
async_task(add_user_to_allocation, allocation_user_pk)
if is_keycloak_enabled():
async_task(add_user_to_keycloak, allocation_user_pk)
else:
add_user_to_allocation(allocation_user_pk)
if is_keycloak_enabled():
add_user_to_keycloak(allocation_user_pk)


@receiver(allocation_remove_user)
def allocation_remove_user_receiver(sender, **kwargs):
allocation_user_pk = kwargs.get("allocation_user_pk")
remove_user_from_allocation(allocation_user_pk)

if is_keycloak_enabled():
remove_user_from_keycloak(allocation_user_pk)
96 changes: 95 additions & 1 deletion src/coldfront_plugin_cloud/tasks.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
import datetime
import logging
import time
import functools
from string import Template

from coldfront.core.allocation.models import Allocation, AllocationUser
from coldfront.core.allocation.models import (
Allocation,
AllocationUser,
AllocationAttribute,
)

from coldfront_plugin_cloud import (
attributes,
Expand All @@ -12,11 +18,17 @@
esi,
openshift_vm,
utils,
kc_client,
)

logger = logging.getLogger(__name__)


@functools.lru_cache()
def get_kc_client():
return kc_client.KeyCloakAPIClient()


def find_allocator(allocation) -> base.ResourceAllocator:
allocators = {
"openstack": openstack.OpenStackResourceAllocator,
Expand Down Expand Up @@ -128,3 +140,85 @@ def remove_user_from_allocation(allocation_user_pk):
allocator.remove_role_from_user(username, project_id)
else:
logger.warning("No project has been created. Nothing to disable.")


def _clean_template_string(template_string: str) -> str:
return template_string.replace(" ", "_").lower()


def _get_keycloak_group_name(allocation: Allocation, template_string: str) -> str:
"""
Acceptable variables for the group name template string is:
- $resource_name: the name of the resource (e.g. "OpenShift")
- Any allocation attribute defined for the allocation, with spaces replaced by underscores and
all lowercase (e.g. for `Project Name`, the variable would be `$project_name`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a $ before project_name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the syntax for Python string templates, $ marks the start of placeholders that will be substituted

"""
resource_name = allocation.resources.first().name
allocation_attrs_list: list[AllocationAttribute] = (
allocation.allocationattribute_set.all()
)

template_sub_dict = {"resource_name": resource_name}
for attr in allocation_attrs_list:
template_sub_dict[
_clean_template_string(attr.allocation_attribute_type.name)
] = attr.value

return Template(template_string).substitute(**template_sub_dict)
Comment on lines +149 to +167
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_keycloak_group_name() uses string.Template.substitute(); if the configured template references a variable that isn't present in template_sub_dict (e.g., a missing allocation attribute), this will raise a KeyError and break the signal/task. Consider using safe_substitute or catching KeyError to log and skip rather than raising.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knikolla @naved001 Not raising the error could lead to users being silently added to incorrectly-named Keycloak groups. Is that acceptable?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task should break rather than do the wrong thing.



def add_user_to_keycloak(allocation_user_pk):
allocation_user = AllocationUser.objects.get(pk=allocation_user_pk)
allocation = allocation_user.allocation

kc_admin_client = get_kc_client()
username = allocation_user.user.username

group_name_template = allocation.resources.first().get_attribute(
attributes.RESOURCE_KEYCLOAK_GROUP_TEMPLATE
)
if group_name_template is None:
logger.info(
f"Keycloak enabled but no group name template specified for resource {allocation.resources.first().name}. Skipping addition to Keycloak group"
)
return

if (user_id := kc_admin_client.get_user_id(username)) is None:
logger.warning(f"User {username} not found in Keycloak, cannot add to group.")
return

group_name = _get_keycloak_group_name(allocation, group_name_template)
kc_admin_client.create_group(group_name)
group_id = kc_admin_client.get_group_id(group_name)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After create_group(), group_id can still be None (e.g., group creation failure or search not returning the new group). add_user_to_group() is then called with group_id=None, producing an invalid URL and a hard failure. Please check group_id for None and handle/log/return before attempting to add the user.

Suggested change
group_id = kc_admin_client.get_group_id(group_name)
group_id = kc_admin_client.get_group_id(group_name)
if group_id is None:
logger.warning(
f"Keycloak group '{group_name}' could not be resolved after creation; cannot add user {username} to group."
)
return

Copilot uses AI. Check for mistakes.
kc_admin_client.add_user_to_group(user_id, group_id)


def remove_user_from_keycloak(allocation_user_pk):
allocation_user = AllocationUser.objects.get(pk=allocation_user_pk)
allocation = allocation_user.allocation

kc_admin_client = get_kc_client()
username = allocation_user.user.username

group_name_template = allocation.resources.first().get_attribute(
attributes.RESOURCE_KEYCLOAK_GROUP_TEMPLATE
)
if group_name_template is None:
logger.info(
f"Keycloak enabled but no group name template specified for resource {allocation.resources.first().name}. Skipping removal from Keycloak group"
)
return

if (user_id := kc_admin_client.get_user_id(username)) is None:
logger.warning(
f"User {username} not found in Keycloak, cannot remove from group."
)
return

group_name = _get_keycloak_group_name(allocation, group_name_template)
if (group_id := kc_admin_client.get_group_id(group_name)) is None:
logger.warning(
f"Group {group_name} not found in Keycloak, skipping removal for user {username}."
)
return
kc_admin_client.remove_user_from_group(user_id, group_id)
Empty file.
Loading
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.