Skip to content

feat: include authenticated user identity in HTTP access log#9991

Open
ardentperf wants to merge 2 commits into
pgadmin-org:masterfrom
ardentperf:pr-loguserid
Open

feat: include authenticated user identity in HTTP access log#9991
ardentperf wants to merge 2 commits into
pgadmin-org:masterfrom
ardentperf:pr-loguserid

Conversation

@ardentperf

@ardentperf ardentperf commented May 29, 2026

Copy link
Copy Markdown

Set an X-Remote-User response header containing the authenticated username on every request. This allows the access log to be configured to include user identity via standard log format directives (%({x-remote-user}o)s in gunicorn, %{X-Remote-User}o in Apache) without requiring any changes to pgAdmin's session or auth behaviour.

Closes #9990

Default gunicorn access log format is at https://gunicorn.org/reference/settings/?h=access_log#access_log_format and https://github.com/benoitc/gunicorn/blob/9bc5891b4b06f25a8ce0e707053dcb2fb9bf638c/gunicorn/config.py#L1413 ; I confirmed that all other fields are default; this PR only changes the user name field.

Performed an end-to-end test on kubernetes with KIND

With this PR:

pgadmin-loguserid-branch

Master Branch:

pgadmin-master-branch

Summary by CodeRabbit

  • New Features
    • Option to include the authenticated username in responses so HTTP access logs can record user identity, improving audit trails and monitoring.
    • New configuration toggle (disabled by default) to enable or disable this behavior.
    • When enabled and a user is authenticated, a response header carries a latin-1-safe username; when disabled or unauthenticated, no username is added.

Set an X-Remote-User response header containing the authenticated
username on every request. This allows the access log to be configured
to include user identity via standard log format directives
(%({x-remote-user}o)s in gunicorn, %{X-Remote-User}o in Apache) without
requiring any changes to pgAdmin's session or auth behaviour.

Signed-off-by: Jeremy Schneider <schneider@ardentperf.com>
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bd02bed5-c71b-4ac4-96a9-e492a27d8600

📥 Commits

Reviewing files that changed from the base of the PR and between b232ead and 0f6f4d5.

📒 Files selected for processing (3)
  • pkg/docker/gunicorn_config.py
  • web/config.py
  • web/pgadmin/__init__.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • web/pgadmin/init.py
  • web/config.py
  • pkg/docker/gunicorn_config.py

Walkthrough

Flask now sets an X-Remote-User response header for authenticated requests when enabled by config.LOG_AUTHENTICATED_USER; Gunicorn's access_log_format is configured to include that header so logs show the authenticated username (or '-' when absent).

Changes

User Identity in HTTP Access Logs

Layer / File(s) Summary
Config flag for header
web/config.py
Adds LOG_AUTHENTICATED_USER = False to control emitting X-Remote-User in responses.
Flask response header for authenticated user
web/pgadmin/__init__.py
after_request sets X-Remote-User to a latin-1-safe current_user.username when LOG_AUTHENTICATED_USER is enabled; removes the header if unauthenticated or empty.
Gunicorn access log format configuration
pkg/docker/gunicorn_config.py
Sets access_log_format to include %({x-remote-user}o)s, logging '-' for missing values.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: include authenticated user identity in HTTP access log' directly and concisely describes the main objective of adding user identity to access logs.
Linked Issues check ✅ Passed The PR implements the core requirement from #9990: exposing authenticated username in HTTP access logs via X-Remote-User header for gunicorn/Apache compatibility.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the linked issue: config flag, response header setting, and gunicorn log format configuration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/docker/gunicorn_config.py (2)

8-11: Consider privacy implications of logging user identities.

The access_log_format now includes authenticated usernames in HTTP access logs, which is the intended behavior per the PR objectives. However, deployments should be aware that:

  • Usernames (e.g., email addresses like admin@pgadmin.org) constitute personally identifiable information (PII)
  • Access logs may be subject to data retention policies under GDPR, CCPA, or other privacy regulations
  • Log aggregation systems, backup procedures, and access controls should account for PII in logs

Consider documenting this change in deployment/administration guides so that operators can implement appropriate log handling policies for their regulatory environment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/docker/gunicorn_config.py` around lines 8 - 11, The access_log_format in
pkg/docker/gunicorn_config.py now injects authenticated usernames via the
X-Remote-User header (access_log_format), which exposes PII; update
documentation and make the behavior configurable: add guidance in
deployment/administration docs describing the PII risk,
retention/aggregation/backup/access-control recommendations, and instructions to
disable or anonymize usernames (e.g., provide a deploy-time option or env var to
remove %({x-remote-user}o)s from access_log_format or enable masking) so
operators can comply with GDPR/CCPA and other policies.

8-11: ⚡ Quick win

Confirm Gunicorn access-log header lookup is case-insensitive (lowercase config is correct).

Gunicorn recommends using lowercase identifiers in access_log_format, and internally normalizes header lookups for %({header-name}o)s (response headers). So %({x-remote-user}o)s will correctly pick up X-Remote-User even though the actual header is capitalized.

  • Operational: logging the authenticated username can be sensitive (PII/auditing concerns); ensure retention/access controls align with your compliance requirements.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/docker/gunicorn_config.py` around lines 8 - 11, The access_log_format
line currently uses the lowercase header token %({x-remote-user}o)s; confirm
that this is correct and leave it lowercase (Gunicorn normalizes header lookups
so %({x-remote-user}o)s will match X-Remote-User), and add a short inline
comment or README note next to access_log_format to document that header lookup
is case-insensitive and that logging usernames may contain PII so
retention/access controls must be applied; reference the access_log_format
setting and the %({x-remote-user}o)s token when making these clarifications.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/docker/gunicorn_config.py`:
- Around line 8-11: The access_log_format in pkg/docker/gunicorn_config.py now
injects authenticated usernames via the X-Remote-User header
(access_log_format), which exposes PII; update documentation and make the
behavior configurable: add guidance in deployment/administration docs describing
the PII risk, retention/aggregation/backup/access-control recommendations, and
instructions to disable or anonymize usernames (e.g., provide a deploy-time
option or env var to remove %({x-remote-user}o)s from access_log_format or
enable masking) so operators can comply with GDPR/CCPA and other policies.
- Around line 8-11: The access_log_format line currently uses the lowercase
header token %({x-remote-user}o)s; confirm that this is correct and leave it
lowercase (Gunicorn normalizes header lookups so %({x-remote-user}o)s will match
X-Remote-User), and add a short inline comment or README note next to
access_log_format to document that header lookup is case-insensitive and that
logging usernames may contain PII so retention/access controls must be applied;
reference the access_log_format setting and the %({x-remote-user}o)s token when
making these clarifications.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fddec9fa-afa7-41a9-b850-a7da26aafb89

📥 Commits

Reviewing files that changed from the base of the PR and between 0d11dbc and beea339.

📒 Files selected for processing (2)
  • pkg/docker/gunicorn_config.py
  • web/pgadmin/__init__.py

@dpage dpage left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — it's a clean, well-scoped change and a genuinely useful feature. One blocking concern and a couple of smaller points.

🔴 Must fix: non-Latin-1 usernames will 500 every authenticated request. See the inline comment — HTTP header values must encode to Latin-1, but current_user.username can legitimately contain characters outside that range (OAuth2 preferred_username/sub/OAUTH2_USERNAME_CLAIM, Kerberos principals, LDAP attributes). Since the header is set unconditionally for every authenticated user, such a user would hit UnicodeEncodeError during response serialization and be locked out entirely. Sanitizing the value before setting the header avoids this.

🟡 Tests: it would be good to add a small regression test asserting the header is present for an authenticated request and absent for an anonymous one — ideally exercising a non-ASCII username to guard the case above.

ℹ️ Notes (non-blocking):

  • The access_log_format change only lands for Docker deployments; standard package/pip server installs running their own gunicorn won't pick up the username field without adding the directive themselves. Consistent with the PR's stated Docker/k8s focus — just flagging the scope.
  • With JSON_LOGGER enabled the username ends up embedded inside the access-log message string rather than as a discrete JSON field, since the JSON formatter wraps gunicorn's rendered access line. Works, but operators expecting a structured field may be surprised.

Comment thread web/pgadmin/__init__.py Outdated
@app.after_request
def after_request(response):
if current_user.is_authenticated:
response.headers['X-Remote-User'] = current_user.username

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP header values must encode to Latin-1, but current_user.username isn't guaranteed to be ASCII/Latin-1. OAuth2 (_resolve_username can return preferred_username, sub, or a configured OAUTH2_USERNAME_CLAIM), Kerberos principals, and LDAP-mapped usernames can all contain Cyrillic/CJK/accented characters.

Verified against the Werkzeug currently pinned in the project:

Gorkov (Cyrillic)            -> UnicodeEncodeError: 'latin-1' codec can't encode...
alice\r\nX-Injected: 1       -> ValueError: Header values must not contain newline characters.

Because the header is set unconditionally for every authenticated user, a user with a non-Latin-1 username will raise during response serialization and get a 500 on every request — effectively locked out of pgAdmin. (The CR/LF case is already blocked by Werkzeug, so there's no header-injection vuln, but it would also 500.)

Suggest sanitizing before setting:

if current_user.is_authenticated and current_user.username:
    # HTTP headers are latin-1 only; avoid 500s for unicode usernames
    safe = current_user.username.encode('latin-1', 'replace').decode('latin-1')
    response.headers['X-Remote-User'] = safe

@ardentperf ardentperf Jun 10, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a little bit of poking around but not seeing an easy way to add tests for this (or for the change as a whole) since they depend on having a server like gunicorn in the loop :-/

I'm a little curious how you reproduced the error and what your setup is... pgadmin wouldnt let me create accounts with special chars in email, and i didn't have OAuth2 setup for testing

I repro'd the error by directly calling gunicorn's to_bytestring() function, just to verify it

@dpage dpage left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this — it's a clean, well-scoped change and a genuinely useful feature. One blocking concern and a couple of smaller points.

🔴 Must fix: non-Latin-1 usernames will 500 every authenticated request. See the inline comment — HTTP header values must encode to Latin-1, but current_user.username can legitimately contain characters outside that range (OAuth2 preferred_username/sub/OAUTH2_USERNAME_CLAIM, Kerberos principals, LDAP attributes). Since the header is set unconditionally for every authenticated user, such a user would hit UnicodeEncodeError during response serialization and be locked out entirely. Sanitizing the value before setting the header avoids this.

🟡 Tests: it would be good to add a small regression test asserting the header is present for an authenticated request and absent for an anonymous one — ideally exercising a non-ASCII username to guard the case above.

ℹ️ Notes (non-blocking):

  • The access_log_format change only lands for Docker deployments; standard package/pip server installs running their own gunicorn won't pick up the username field without adding the directive themselves. Consistent with the PR's stated Docker/k8s focus — just flagging the scope.
  • With JSON_LOGGER enabled the username ends up embedded inside the access-log message string rather than as a discrete JSON field, since the JSON formatter wraps gunicorn's rendered access line. Works, but operators expecting a structured field may be surprised.

Comment thread web/pgadmin/__init__.py Outdated
@app.after_request
def after_request(response):
if current_user.is_authenticated:
response.headers['X-Remote-User'] = current_user.username

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP header values must encode to Latin-1, but current_user.username isn't guaranteed to be ASCII/Latin-1. OAuth2 (_resolve_username can return preferred_username, sub, or a configured OAUTH2_USERNAME_CLAIM), Kerberos principals, and LDAP-mapped usernames can all contain Cyrillic/CJK/accented characters.

Verified against the Werkzeug currently pinned in the project:

Gorkov (Cyrillic)            -> UnicodeEncodeError: 'latin-1' codec can't encode...
alice\r\nX-Injected: 1       -> ValueError: Header values must not contain newline characters.

Because the header is set unconditionally for every authenticated user, a user with a non-Latin-1 username will raise during response serialization and get a 500 on every request — effectively locked out of pgAdmin. (The CR/LF case is already blocked by Werkzeug, so there's no header-injection vuln, but it would also 500.)

Suggest sanitizing before setting:

if current_user.is_authenticated and current_user.username:
    # HTTP headers are latin-1 only; avoid 500s for unicode usernames
    safe = current_user.username.encode('latin-1', 'replace').decode('latin-1')
    response.headers['X-Remote-User'] = safe

@dpage

dpage commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Please also address the python style issue causing CI to fail.

Thanks!

Comment thread web/pgadmin/__init__.py Outdated

@app.after_request
def after_request(response):
if current_user.is_authenticated:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since not everyone wants to send user information in plain text in every response, we must make it configurable.

LOG_AUTHENTICATED_USER=True/False

Comment thread web/pgadmin/__init__.py Outdated

@app.after_request
def after_request(response):
if current_user.is_authenticated:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if current_user.is_authenticated:
if current_user.is_authenticated:
response.headers['X-Remote-User'] = current_user.username
else:
# prevents any accidental reuse if middleware or future code sets the header earlier
response.headers.pop('X-Remote-User', None)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure I understand this. I guess the main concern here is inconsistency - it would be a bit weird to overwrite the header sometimes and not others - and users wouldn't know who set the value they see in the field. If we're going to set a value in this field sometimes, then it's best to always control the contents of the field. That makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the concern is less about the current implementation and more about ensuring the header is always under pgAdmin's control. If we only overwrite it for authenticated users, then in other cases it's unclear whether the value came from pgAdmin itself or was set earlier by middleware or some future code path. Explicitly removing it when there is no authenticated user keeps the behavior deterministic and avoids any ambiguity about the source of the header value.

@ardentperf

Copy link
Copy Markdown
Author

Apologies, this past week has been a bit crazy - haven’t forgotten and should get to it this week 🤞

@dpage

dpage commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

A few additional notes to fold into your next pass, on top of the inline comments above:

1. The pep8/CI failure is just the new access_log_format line (96 chars > 79). # noqa won't help since we run pycodestyle directly, so please wrap it:

access_log_format = (
    '%(h)s %(l)s %({x-remote-user}o)s %(t)s '
    '"%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"'
)

2. Default the config flag to off. Building on @mzabuawala's LOG_AUTHENTICATED_USER suggestion: X-Remote-User is sent on every response to the client, not just consumed by the log — so it's visible to proxies, CDNs, TLS middleboxes, browser extensions, etc. For a server-side logging feature that's broader exposure than needed, so the flag should default to False (opt-in), with a docs note that enabling it surfaces the username in response headers.

3. Optional design alternative. If cross-deployment portability (Apache %{X-Remote-User}o) isn't essential, you could stash the name in the WSGI environ instead of a response header — request.environ['x_remote_user'] = current_user.username — and log it via gunicorn's environ atom %({x_remote_user}e)s. That keeps the identity entirely server-side (never sent to the client) and sidesteps the Latin-1 header-encoding 500 altogether. The trade-off is it's gunicorn-specific, whereas the response-header approach also works under Apache — so it's a judgement call, not a request. (Worth confirming the {}e atom behaviour against the pinned gunicorn if you go that route.)

Thanks again for the contribution — no rush, just capturing these so they're in one place for your update.

@ardentperf

Copy link
Copy Markdown
Author

3. Optional design alternative. If cross-deployment portability (Apache %{X-Remote-User}o) isn't essential, you could stash the name in the WSGI environ instead of a response header — request.environ['x_remote_user'] = current_user.username — and log it via gunicorn's environ atom %({x_remote_user}e)s. That keeps the identity entirely server-side (never sent to the client) and sidesteps the Latin-1 header-encoding 500 altogether. The trade-off is it's gunicorn-specific, whereas the response-header approach also works under Apache — so it's a judgement call, not a request. (Worth confirming the {}e atom behaviour against the pinned gunicorn if you go that route.)

The main use case I'm interested in right now is containerized deployments on kubernetes, so gunicorn covers the immediate case. But personally I think there's value in the overall idea to many pgAdmin users. My vote would be for the opt-in apache-compatible approach, which makes it accessible others who are also interested in using this outside kubernetes for a more complete audit picture (eg. seeing who has downloaded data from which database and how many bytes were in the download request). I also don't think there should be any concerns of sensitivity (even in regulated environments) around simply knowing which authenticated user made which requests. It's fairly standard for an access log. This being said, I'm open to either option if someone felt there was a strong argument for keeping everything server-side.

ardentperf-agent Bot added a commit to ardentperf/pgadmin4 that referenced this pull request Jun 10, 2026
…r X-Remote-User header

Addresses reviewer feedback on PR pgadmin-org#9991:
- Gate the X-Remote-User header behind LOG_AUTHENTICATED_USER (default False)
- Encode username as latin-1 with replacement to prevent gunicorn 500s for non-ASCII usernames
- Clear the header on unauthenticated requests when the feature is enabled
ardentperf added a commit to ardentperf/pgadmin4 that referenced this pull request Jun 10, 2026
…mote-User header

Addresses reviewer feedback on PR pgadmin-org#9991:
- Gate the X-Remote-User header behind LOG_AUTHENTICATED_USER (default
False)
- Encode username as latin-1 with replacement to prevent gunicorn 500s
for non-ASCII usernames
- Clear the header on unauthenticated requests when the feature is
enabled

Signed-off-by: Jeremy Schneider <schneider@ardentperf.com>
…mote-User header

Addresses reviewer feedback on PR pgadmin-org#9991:
- Gate the X-Remote-User header behind LOG_AUTHENTICATED_USER (default
False)
- Encode username as latin-1 with replacement to prevent gunicorn 500s
for non-ASCII usernames
- Clear the header on unauthenticated requests when the feature is
enabled

Signed-off-by: Jeremy Schneider <schneider@ardentperf.com>

style: fix E501 line too long in after_request

style: fix E501 line too long in gunicorn_config.py
@ardentperf

ardentperf commented Jun 10, 2026

Copy link
Copy Markdown
Author

pushed updates to the PR addressing comments. CI failures on python style should be addressed.

Default pgAdmin behavior is unchanged; X-Remote-User header is only managed if the user enables the LOG_AUTHENTICATED_USER configuration setting.

I tested with the helm chart on k8s and confirmed that there's no logging by default, and logging is enabled when I add this to my values.yaml file:

extraEnvVars:
  - name: PGADMIN_CONFIG_LOG_AUTHENTICATED_USER
    value: "True"

Is this a good name for the config setting? Technically it controls the header, and I've updated the default gunicorn logging to automatically pick it up. For users who consume this through docker and helm, the practical effect is that the config does enable logging.

@ardentperf

Copy link
Copy Markdown
Author

after a bit more thought, i'm also testing the WSGI environ approach to see if this works.

@ardentperf

Copy link
Copy Markdown
Author

i just tried testing request.environ['x_remote_user'] = current_user.username and %({x_remote_user}e)s but gunicorn did not log usernames. now i remember trying this before too, and i never did figure out why it doesn't work...

@ardentperf

Copy link
Copy Markdown
Author

I think the culprit is here:

https://github.com/miguelgrinberg/flask-socketio/blob/v5.6.1/src/flask_socketio/__init__.py#L40

flask_socketio wraps app.wsgi_app with its own middleware (SocketioMiddleware). Its call method copies the environ before passing it to Flask. The copy at line 40 means Flask receives a new dict — a shallow copy of gunicorn's original environ. Any mutations Flask makes (including request.environ['x_remote_user'] = ... in after_request) go into the copy. Gunicorn's original environ dict is untouched. When gunicorn calls self.log.access(resp, req, environ, ...), it passes its original dict, which never has x_remote_user set. This was confirmed by logging id() of the environ in both after_request and gunicorn's handle_request — they were different on every request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: include authenticated user identity in HTTP access log

3 participants