Skip to content

Execution API: Normalize error response formats#63801

Closed
henry3260 wants to merge 5 commits intoapache:mainfrom
henry3260:fix-api-dict-consistency
Closed

Execution API: Normalize error response formats#63801
henry3260 wants to merge 5 commits intoapache:mainfrom
henry3260:fix-api-dict-consistency

Conversation

@henry3260
Copy link
Copy Markdown
Contributor

@henry3260 henry3260 commented Mar 17, 2026

What

Standardized Execution API error responses to use structured detail payloads (with reason and message) instead of mixed string/dict formats.
Updated affected routes in variables, xcoms, and task_instances to return consistent error shapes.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@henry3260 henry3260 force-pushed the fix-api-dict-consistency branch from e079ab4 to 2ae9a0d Compare March 17, 2026 10:12
@henry3260 henry3260 marked this pull request as draft March 17, 2026 12:08
@henry3260 henry3260 force-pushed the fix-api-dict-consistency branch from 2ae9a0d to 7441553 Compare March 17, 2026 13:13
@henry3260 henry3260 marked this pull request as ready for review March 17, 2026 13:13
@henry3260 henry3260 marked this pull request as draft March 27, 2026 14:02
@henry3260 henry3260 marked this pull request as ready for review March 27, 2026 14:02
Copy link
Copy Markdown
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to add a subclass like ExecutionAPIHTTPException (or ExecutionHTTPException for the shorthand) to enforce the error response format in the future?

@henry3260
Copy link
Copy Markdown
Contributor Author

Would it be better to add a subclass like ExecutionAPIHTTPException (or ExecutionHTTPException for the shorthand) to enforce the error response format in the future?

That's a good idea!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Standardizes Execution API error responses to a consistent structured detail payload (e.g., {reason, message}) by introducing an ExecutionHTTPException wrapper and updating multiple routes/tests to use it.

Changes:

  • Added ExecutionHTTPException to normalize detail into a consistent {reason, message} shape.
  • Replaced multiple fastapi.HTTPException usages across Execution API routes with ExecutionHTTPException.
  • Updated task instance unit tests to assert the new structured error detail format.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py Updates assertions to match new structured error response shapes.
airflow-core/src/airflow/api_fastapi/execution_api/routes/xcoms.py Normalizes error responses by switching to ExecutionHTTPException.
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py Switches to ExecutionHTTPException and adds additional validation branches.
airflow-core/src/airflow/api_fastapi/execution_api/routes/task_instances.py Replaces mixed string/dict error details with consistent structured payloads.
airflow-core/src/airflow/api_fastapi/execution_api/routes/hitl.py Switches to ExecutionHTTPException for not-found/conflict cases.
airflow-core/src/airflow/api_fastapi/execution_api/routes/dags.py Converts DAG not-found response to ExecutionHTTPException structured detail.
airflow-core/src/airflow/api_fastapi/execution_api/routes/dag_runs.py Converts DAG/DAG run errors to ExecutionHTTPException structured detail.
airflow-core/src/airflow/api_fastapi/execution_api/routes/connections.py Converts connection not-found response to ExecutionHTTPException.
airflow-core/src/airflow/api_fastapi/execution_api/routes/assets.py Converts asset not-found response to ExecutionHTTPException.
airflow-core/src/airflow/api_fastapi/execution_api/routes/asset_events.py Converts parameter-validation error to ExecutionHTTPException.
airflow-core/src/airflow/api_fastapi/common/exceptions.py Introduces ExecutionHTTPException to enforce consistent {reason, message} error detail format.
Comments suppressed due to low confidence (1)

airflow-core/src/airflow/api_fastapi/execution_api/routes/asset_events.py:1

  • The reason value ("Missing parameter") is not normalized (it contains spaces and capitalization) and is inconsistent with the other routes’ machine-readable tokens (e.g., not_found, invalid_request). Use a stable token (e.g., missing_parameter) and keep the human-readable text in message.
# Licensed to the Apache Software Foundation (ASF) under one

Comment on lines +99 to +106
if not variable_key:
raise ExecutionHTTPException(
status.HTTP_404_NOT_FOUND,
detail={
"reason": "not_found",
"message": "Not Found",
},
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable_key is a path parameter, so an “empty string” value won’t match the route (the segment can’t be empty). These branches are effectively unreachable and also return 404 Not Found, which is not an appropriate status for an invalid/missing path parameter. Consider removing these checks; if you want explicit validation, enforce it at the parameter level (e.g., Path(min_length=1)) so FastAPI returns a 422 validation error with the normalized detail shape.

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +132
if not variable_key:
raise ExecutionHTTPException(
status.HTTP_404_NOT_FOUND,
detail={
"reason": "not_found",
"message": "Not Found",
},
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable_key is a path parameter, so an “empty string” value won’t match the route (the segment can’t be empty). These branches are effectively unreachable and also return 404 Not Found, which is not an appropriate status for an invalid/missing path parameter. Consider removing these checks; if you want explicit validation, enforce it at the parameter level (e.g., Path(min_length=1)) so FastAPI returns a 422 validation error with the normalized detail shape.

Copilot uses AI. Check for mistakes.
raise HTTPException(
raise ExecutionHTTPException(
status.HTTP_409_CONFLICT,
f"Human-in-the-loop detail for Task Instance with id {task_instance_id} already exists.",
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a string detail here will be normalized by ExecutionHTTPException, but it will default reason to "error". For consistency with the PR goal (“structured detail payloads with reason and message”), provide an explicit dict with a specific reason (e.g., already_exists) and the string as message.

Suggested change
f"Human-in-the-loop detail for Task Instance with id {task_instance_id} already exists.",
detail={
"reason": "already_exists",
"message": (
f"Human-in-the-loop detail for Task Instance with id {task_instance_id} already exists."
),
},

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +153
class ExecutionHTTPException(HTTPException):
"""
HTTPException subclass used by Execution API.

Enforces consistent error response format containing `reason` and `message` keys.
"""

def __init__(
self,
status_code: int,
detail: dict[str, Any] | str | None = None,
headers: dict[str, str] | None = None,
) -> None:
"""Initialize with enforced dict format."""
if isinstance(detail, str) or detail is None:
detail = {
"reason": "error",
"message": detail or "An error occurred",
}
elif isinstance(detail, dict):
detail = dict(detail) # copy
if "reason" not in detail:
detail["reason"] = "error"
if "message" not in detail:
detail["message"] = "An error occurred"
else:
detail = {"reason": "error", "message": str(detail)}

super().__init__(status_code=status_code, detail=detail, headers=headers)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing ExecutionHTTPException only normalizes errors for call sites that explicitly use it. In this same module, some existing handlers (e.g., DagErrorHandler) still raise HTTPException with string detail, which will continue producing mixed formats. To fully enforce consistency, either (1) update Execution API error handlers to raise ExecutionHTTPException, or (2) register a FastAPI exception handler for HTTPException under the Execution API app/router that wraps/normalizes exc.detail into the {reason, message} shape.

Copilot uses AI. Check for mistakes.
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Apr 2, 2026
Copy link
Copy Markdown
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by copilot, some other utility function can still raise mixed content exception.

If we want to catch everything a global exception handler is probably better. And we could have this registered at the top app level, because we have the same issue with the public API.

Always returning a 'dict' for 'details' with at least 'reason/message' seems cool.

Copy link
Copy Markdown
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the update.

Comment thread airflow-core/src/airflow/api_fastapi/common/exceptions.py
@henry3260 henry3260 force-pushed the fix-api-dict-consistency branch 2 times, most recently from f0bcc06 to 6c8d045 Compare April 5, 2026 10:02
@henry3260 henry3260 force-pushed the fix-api-dict-consistency branch from 6c8d045 to d3f8967 Compare April 5, 2026 16:51
Copy link
Copy Markdown
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, I will defer to @pierrejeambrun regarding the exception response breaking change.

Comment on lines +197 to +202
# Normalize any HTTPException to have dict detail with reason/message
@app.exception_handler(HTTPException)
async def _http_exception_handler(request: Request, exc: HTTPException):
detail = _ensure_detail_dict(getattr(exc, "detail", None))
headers = getattr(exc, "headers", None)
return JSONResponse(status_code=exc.status_code, content={"detail": detail}, headers=headers)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable to introduce RestAPI breaking change for 3.3.0?
cc @pierrejeambrun

Since the normalization for the HTTPException is breaking change for users who rely on the response body.

Comment thread airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py Outdated
Comment thread airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py Outdated
@henry3260 henry3260 force-pushed the fix-api-dict-consistency branch from c035392 to d1fa76d Compare April 6, 2026 14:38
@henry3260 henry3260 force-pushed the fix-api-dict-consistency branch from d1fa76d to 5efc0b1 Compare April 6, 2026 16:16
@henry3260 henry3260 requested a review from pierrejeambrun April 7, 2026 19:12
Copy link
Copy Markdown
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having second thought on this.

First the global exception handler is obfuscating the code. Someone reading the code won't understand at first why the response he's seeing is different from what the code raises. (a global exception handler is re modeling the response shape afterwards) So actually fixing the response at all different places is probably a more maintainable 'long term goal'.

Also as mentioned this is a breaking change (might be small but still breaking). I would say it's worth considering for a critical reason (security fix, global design shift, or huge maintainability gain), but just for cleaning the code / changing response format for consistency, i don't think it's worth the trouble. I would just accept that error response format isn't normalized...

And I would advocate for closing this.

Curious to see what other think.

edit: After double checking it appears that public/ui/auth managers apis are all consistent returning a 'string'. The problem is only located in the execution_api which uses both a mixture of 'str' and struct 'dict' for the details.

So the problem is only for that execution_api, and I'd say the repartition is half half, so it's not clear what was the original intent. But moving to fully struct seems good. I'd say we need to:

  • Check if the client sdk (consuming that execution API) is using those error messages.
  • If not then it means that we won't break generated client sdk by updating the error messages. (people most likely do not use the execution api directly without using the sdk client) And we can probably go ahead and do the change directly at the different places not returning the structure dict. (If we accept the risk)
  • If the client sdk do use some of the error message, I'd say we close this PR, we can't break it

Copy link
Copy Markdown
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling and munging code is too complex given we control all the raise sites.

If we are unifying this, maybe we should do what the TODO in routes/t_i.py says:

        # TODO: Pass a RFC 9457 compliant error message in "detail" field
        # https://datatracker.ietf.org/doc/html/rfc9457
        # to provide more information about the error

(Note: that is not a "we must do this", but we should either do it now, or remove that comment now)

)


class ExecutionHTTPException(HTTPException):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in common if it's specific to the Execution API?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in common if it's specific to the Execution API?

yes, if now we focus on execution API we can move this to others.

Comment thread airflow-core/src/airflow/api_fastapi/common/exceptions.py
# Catch-all for unhandled Exceptions
@app.exception_handler(Exception)
async def _unhandled_exception_handler(request: Request, exc: Exception):
exception_id = get_random_string()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this? Is it logged anywhere to be able to correlate it against the exception when it's not sent to the client.

log.exception("Unhandled error id %s while handling request %s", exception_id, request.url.path)

if conf.get("api", "expose_stacktrace") == "True":
message = f"Unhandled error id {exception_id}: {exc}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give this is JSON, we shouldn't put multiple things into a single string should we -- we've got structure, lets use it?

Comment on lines +197 to +199
# Normalize any HTTPException to have dict detail with reason/message
@app.exception_handler(HTTPException)
async def _http_exception_handler(request: Request, exc: HTTPException):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be done in line here, but put in ERROR_HANDLERS instead -- makes register fn single purpose and much smaller.

return {"reason": "error", "message": str(nested_detail)}

# Idempotent path: keep already structured details intact.
normalized = dict(detail)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new dict?

normalized.setdefault("reason", "error")
normalized.setdefault("message", "An error occurred")
return normalized
return {"reason": "error", "message": str(nested_detail)}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why force it to a string.

This entire function feels like backcompat that we shouldn't need given we are in control of all of the callers.

Comment on lines +148 to +152
if extra:
# Do not allow overriding reason/message through extra accidentally.
for k, v in extra.items():
if k not in ("reason", "message"):
detail[k] = v
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do this by having extra be **extra, then this block can go.

@ashb
Copy link
Copy Markdown
Member

ashb commented Apr 10, 2026

people most likely do not use the execution api directly without using the sdk client

It's a private API, so the only thing we have to about is compat with python or beta Go Task SDKs.

@ashb
Copy link
Copy Markdown
Member

ashb commented Apr 10, 2026

If the client sdk do use some of the error message, I'd say we close this PR, we can't break it

We can likely handle this via a Cadwyn migration. It can mutate payloads ig needed.

@henry3260
Copy link
Copy Markdown
Contributor Author

I'd prefer to close this PR since it contains too many unrelated changes. I’ll open a new, cleaner PR to address the specific comments we discussed. Thanks for the guidance!

@henry3260 henry3260 closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:task-sdk ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants