feat: include authenticated user identity in HTTP access log#9991
feat: include authenticated user identity in HTTP access log#9991ardentperf wants to merge 2 commits into
Conversation
Set an X-Remote-User response header containing the authenticated
username on every request. This allows the access log to be configured
to include user identity via standard log format directives
(%({x-remote-user}o)s in gunicorn, %{X-Remote-User}o in Apache) without
requiring any changes to pgAdmin's session or auth behaviour.
Signed-off-by: Jeremy Schneider <schneider@ardentperf.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
WalkthroughFlask now sets an X-Remote-User response header for authenticated requests when enabled by config.LOG_AUTHENTICATED_USER; Gunicorn's access_log_format is configured to include that header so logs show the authenticated username (or '-' when absent). ChangesUser Identity in HTTP Access Logs
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/docker/gunicorn_config.py (2)
8-11: Consider privacy implications of logging user identities.The
access_log_formatnow includes authenticated usernames in HTTP access logs, which is the intended behavior per the PR objectives. However, deployments should be aware that:
- Usernames (e.g., email addresses like
admin@pgadmin.org) constitute personally identifiable information (PII)- Access logs may be subject to data retention policies under GDPR, CCPA, or other privacy regulations
- Log aggregation systems, backup procedures, and access controls should account for PII in logs
Consider documenting this change in deployment/administration guides so that operators can implement appropriate log handling policies for their regulatory environment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/docker/gunicorn_config.py` around lines 8 - 11, The access_log_format in pkg/docker/gunicorn_config.py now injects authenticated usernames via the X-Remote-User header (access_log_format), which exposes PII; update documentation and make the behavior configurable: add guidance in deployment/administration docs describing the PII risk, retention/aggregation/backup/access-control recommendations, and instructions to disable or anonymize usernames (e.g., provide a deploy-time option or env var to remove %({x-remote-user}o)s from access_log_format or enable masking) so operators can comply with GDPR/CCPA and other policies.
8-11: ⚡ Quick winConfirm Gunicorn access-log header lookup is case-insensitive (lowercase config is correct).
Gunicorn recommends using lowercase identifiers in
access_log_format, and internally normalizes header lookups for%({header-name}o)s(response headers). So%({x-remote-user}o)swill correctly pick upX-Remote-Usereven though the actual header is capitalized.
- Operational: logging the authenticated username can be sensitive (PII/auditing concerns); ensure retention/access controls align with your compliance requirements.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/docker/gunicorn_config.py` around lines 8 - 11, The access_log_format line currently uses the lowercase header token %({x-remote-user}o)s; confirm that this is correct and leave it lowercase (Gunicorn normalizes header lookups so %({x-remote-user}o)s will match X-Remote-User), and add a short inline comment or README note next to access_log_format to document that header lookup is case-insensitive and that logging usernames may contain PII so retention/access controls must be applied; reference the access_log_format setting and the %({x-remote-user}o)s token when making these clarifications.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/docker/gunicorn_config.py`:
- Around line 8-11: The access_log_format in pkg/docker/gunicorn_config.py now
injects authenticated usernames via the X-Remote-User header
(access_log_format), which exposes PII; update documentation and make the
behavior configurable: add guidance in deployment/administration docs describing
the PII risk, retention/aggregation/backup/access-control recommendations, and
instructions to disable or anonymize usernames (e.g., provide a deploy-time
option or env var to remove %({x-remote-user}o)s from access_log_format or
enable masking) so operators can comply with GDPR/CCPA and other policies.
- Around line 8-11: The access_log_format line currently uses the lowercase
header token %({x-remote-user}o)s; confirm that this is correct and leave it
lowercase (Gunicorn normalizes header lookups so %({x-remote-user}o)s will match
X-Remote-User), and add a short inline comment or README note next to
access_log_format to document that header lookup is case-insensitive and that
logging usernames may contain PII so retention/access controls must be applied;
reference the access_log_format setting and the %({x-remote-user}o)s token when
making these clarifications.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fddec9fa-afa7-41a9-b850-a7da26aafb89
📒 Files selected for processing (2)
pkg/docker/gunicorn_config.pyweb/pgadmin/__init__.py
dpage
left a comment
There was a problem hiding this comment.
Thanks for this — it's a clean, well-scoped change and a genuinely useful feature. One blocking concern and a couple of smaller points.
🔴 Must fix: non-Latin-1 usernames will 500 every authenticated request. See the inline comment — HTTP header values must encode to Latin-1, but current_user.username can legitimately contain characters outside that range (OAuth2 preferred_username/sub/OAUTH2_USERNAME_CLAIM, Kerberos principals, LDAP attributes). Since the header is set unconditionally for every authenticated user, such a user would hit UnicodeEncodeError during response serialization and be locked out entirely. Sanitizing the value before setting the header avoids this.
🟡 Tests: it would be good to add a small regression test asserting the header is present for an authenticated request and absent for an anonymous one — ideally exercising a non-ASCII username to guard the case above.
ℹ️ Notes (non-blocking):
- The
access_log_formatchange only lands for Docker deployments; standard package/pip server installs running their own gunicorn won't pick up the username field without adding the directive themselves. Consistent with the PR's stated Docker/k8s focus — just flagging the scope. - With
JSON_LOGGERenabled the username ends up embedded inside the access-log message string rather than as a discrete JSON field, since the JSON formatter wraps gunicorn's rendered access line. Works, but operators expecting a structured field may be surprised.
| @app.after_request | ||
| def after_request(response): | ||
| if current_user.is_authenticated: | ||
| response.headers['X-Remote-User'] = current_user.username |
There was a problem hiding this comment.
HTTP header values must encode to Latin-1, but current_user.username isn't guaranteed to be ASCII/Latin-1. OAuth2 (_resolve_username can return preferred_username, sub, or a configured OAUTH2_USERNAME_CLAIM), Kerberos principals, and LDAP-mapped usernames can all contain Cyrillic/CJK/accented characters.
Verified against the Werkzeug currently pinned in the project:
Gorkov (Cyrillic) -> UnicodeEncodeError: 'latin-1' codec can't encode...
alice\r\nX-Injected: 1 -> ValueError: Header values must not contain newline characters.
Because the header is set unconditionally for every authenticated user, a user with a non-Latin-1 username will raise during response serialization and get a 500 on every request — effectively locked out of pgAdmin. (The CR/LF case is already blocked by Werkzeug, so there's no header-injection vuln, but it would also 500.)
Suggest sanitizing before setting:
if current_user.is_authenticated and current_user.username:
# HTTP headers are latin-1 only; avoid 500s for unicode usernames
safe = current_user.username.encode('latin-1', 'replace').decode('latin-1')
response.headers['X-Remote-User'] = safeThere was a problem hiding this comment.
I did a little bit of poking around but not seeing an easy way to add tests for this (or for the change as a whole) since they depend on having a server like gunicorn in the loop :-/
I'm a little curious how you reproduced the error and what your setup is... pgadmin wouldnt let me create accounts with special chars in email, and i didn't have OAuth2 setup for testing
I repro'd the error by directly calling gunicorn's to_bytestring() function, just to verify it
dpage
left a comment
There was a problem hiding this comment.
Thanks for this — it's a clean, well-scoped change and a genuinely useful feature. One blocking concern and a couple of smaller points.
🔴 Must fix: non-Latin-1 usernames will 500 every authenticated request. See the inline comment — HTTP header values must encode to Latin-1, but current_user.username can legitimately contain characters outside that range (OAuth2 preferred_username/sub/OAUTH2_USERNAME_CLAIM, Kerberos principals, LDAP attributes). Since the header is set unconditionally for every authenticated user, such a user would hit UnicodeEncodeError during response serialization and be locked out entirely. Sanitizing the value before setting the header avoids this.
🟡 Tests: it would be good to add a small regression test asserting the header is present for an authenticated request and absent for an anonymous one — ideally exercising a non-ASCII username to guard the case above.
ℹ️ Notes (non-blocking):
- The
access_log_formatchange only lands for Docker deployments; standard package/pip server installs running their own gunicorn won't pick up the username field without adding the directive themselves. Consistent with the PR's stated Docker/k8s focus — just flagging the scope. - With
JSON_LOGGERenabled the username ends up embedded inside the access-log message string rather than as a discrete JSON field, since the JSON formatter wraps gunicorn's rendered access line. Works, but operators expecting a structured field may be surprised.
| @app.after_request | ||
| def after_request(response): | ||
| if current_user.is_authenticated: | ||
| response.headers['X-Remote-User'] = current_user.username |
There was a problem hiding this comment.
HTTP header values must encode to Latin-1, but current_user.username isn't guaranteed to be ASCII/Latin-1. OAuth2 (_resolve_username can return preferred_username, sub, or a configured OAUTH2_USERNAME_CLAIM), Kerberos principals, and LDAP-mapped usernames can all contain Cyrillic/CJK/accented characters.
Verified against the Werkzeug currently pinned in the project:
Gorkov (Cyrillic) -> UnicodeEncodeError: 'latin-1' codec can't encode...
alice\r\nX-Injected: 1 -> ValueError: Header values must not contain newline characters.
Because the header is set unconditionally for every authenticated user, a user with a non-Latin-1 username will raise during response serialization and get a 500 on every request — effectively locked out of pgAdmin. (The CR/LF case is already blocked by Werkzeug, so there's no header-injection vuln, but it would also 500.)
Suggest sanitizing before setting:
if current_user.is_authenticated and current_user.username:
# HTTP headers are latin-1 only; avoid 500s for unicode usernames
safe = current_user.username.encode('latin-1', 'replace').decode('latin-1')
response.headers['X-Remote-User'] = safe|
Please also address the python style issue causing CI to fail. Thanks! |
|
|
||
| @app.after_request | ||
| def after_request(response): | ||
| if current_user.is_authenticated: |
There was a problem hiding this comment.
Since not everyone wants to send user information in plain text in every response, we must make it configurable.
LOG_AUTHENTICATED_USER=True/False
|
|
||
| @app.after_request | ||
| def after_request(response): | ||
| if current_user.is_authenticated: |
There was a problem hiding this comment.
| if current_user.is_authenticated: | |
| if current_user.is_authenticated: | |
| response.headers['X-Remote-User'] = current_user.username | |
| else: | |
| # prevents any accidental reuse if middleware or future code sets the header earlier | |
| response.headers.pop('X-Remote-User', None) |
There was a problem hiding this comment.
Making sure I understand this. I guess the main concern here is inconsistency - it would be a bit weird to overwrite the header sometimes and not others - and users wouldn't know who set the value they see in the field. If we're going to set a value in this field sometimes, then it's best to always control the contents of the field. That makes sense.
There was a problem hiding this comment.
My understanding is that the concern is less about the current implementation and more about ensuring the header is always under pgAdmin's control. If we only overwrite it for authenticated users, then in other cases it's unclear whether the value came from pgAdmin itself or was set earlier by middleware or some future code path. Explicitly removing it when there is no authenticated user keeps the behavior deterministic and avoids any ambiguity about the source of the header value.
|
Apologies, this past week has been a bit crazy - haven’t forgotten and should get to it this week 🤞 |
|
A few additional notes to fold into your next pass, on top of the inline comments above: 1. The pep8/CI failure is just the new access_log_format = (
'%(h)s %(l)s %({x-remote-user}o)s %(t)s '
'"%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"'
)2. Default the config flag to off. Building on @mzabuawala's 3. Optional design alternative. If cross-deployment portability (Apache Thanks again for the contribution — no rush, just capturing these so they're in one place for your update. |
The main use case I'm interested in right now is containerized deployments on kubernetes, so gunicorn covers the immediate case. But personally I think there's value in the overall idea to many pgAdmin users. My vote would be for the opt-in apache-compatible approach, which makes it accessible others who are also interested in using this outside kubernetes for a more complete audit picture (eg. seeing who has downloaded data from which database and how many bytes were in the download request). I also don't think there should be any concerns of sensitivity (even in regulated environments) around simply knowing which authenticated user made which requests. It's fairly standard for an access log. This being said, I'm open to either option if someone felt there was a strong argument for keeping everything server-side. |
…r X-Remote-User header Addresses reviewer feedback on PR pgadmin-org#9991: - Gate the X-Remote-User header behind LOG_AUTHENTICATED_USER (default False) - Encode username as latin-1 with replacement to prevent gunicorn 500s for non-ASCII usernames - Clear the header on unauthenticated requests when the feature is enabled
…mote-User header Addresses reviewer feedback on PR pgadmin-org#9991: - Gate the X-Remote-User header behind LOG_AUTHENTICATED_USER (default False) - Encode username as latin-1 with replacement to prevent gunicorn 500s for non-ASCII usernames - Clear the header on unauthenticated requests when the feature is enabled Signed-off-by: Jeremy Schneider <schneider@ardentperf.com>
…mote-User header Addresses reviewer feedback on PR pgadmin-org#9991: - Gate the X-Remote-User header behind LOG_AUTHENTICATED_USER (default False) - Encode username as latin-1 with replacement to prevent gunicorn 500s for non-ASCII usernames - Clear the header on unauthenticated requests when the feature is enabled Signed-off-by: Jeremy Schneider <schneider@ardentperf.com> style: fix E501 line too long in after_request style: fix E501 line too long in gunicorn_config.py
|
pushed updates to the PR addressing comments. CI failures on python style should be addressed. Default pgAdmin behavior is unchanged; I tested with the helm chart on k8s and confirmed that there's no logging by default, and logging is enabled when I add this to my Is this a good name for the config setting? Technically it controls the header, and I've updated the default gunicorn logging to automatically pick it up. For users who consume this through docker and helm, the practical effect is that the config does enable logging. |
|
after a bit more thought, i'm also testing the WSGI environ approach to see if this works. |
|
i just tried testing |
|
I think the culprit is here: https://github.com/miguelgrinberg/flask-socketio/blob/v5.6.1/src/flask_socketio/__init__.py#L40
|
Set an X-Remote-User response header containing the authenticated username on every request. This allows the access log to be configured to include user identity via standard log format directives (%({x-remote-user}o)s in gunicorn, %{X-Remote-User}o in Apache) without requiring any changes to pgAdmin's session or auth behaviour.
Closes #9990
Default gunicorn access log format is at https://gunicorn.org/reference/settings/?h=access_log#access_log_format and https://github.com/benoitc/gunicorn/blob/9bc5891b4b06f25a8ce0e707053dcb2fb9bf638c/gunicorn/config.py#L1413 ; I confirmed that all other fields are default; this PR only changes the user name field.
Performed an end-to-end test on kubernetes with KIND
With this PR:
Master Branch:
Summary by CodeRabbit