Skip to content

[FC-0118] docs: add ADR-0029 standardized error responses decision#38246

Open
taimoor-ahmed-1 wants to merge 2 commits intoopenedx:docs/ADRs-axim_api_improvementsfrom
edly-io:docs/ADR-standardize_error_responses
Open

[FC-0118] docs: add ADR-0029 standardized error responses decision#38246
taimoor-ahmed-1 wants to merge 2 commits intoopenedx:docs/ADRs-axim_api_improvementsfrom
edly-io:docs/ADR-standardize_error_responses

Conversation

@taimoor-ahmed-1
Copy link
Copy Markdown
Contributor

@taimoor-ahmed-1 taimoor-ahmed-1 commented Mar 30, 2026

Description

This PR adds an ADR decision to the edx-platform documentation to standardize how Open edX REST APIs return error payloads to clients. The ADR defines a single, predictable structured JSON error object contract (with consistent status codes and validation error formatting) and provides a reference DRF-style exception handler example.
related issue: #38167

Add edx-platform/docs/decisions/0029-standardize-error-responses.rst describing objectives and the target contract for REST API error responses.
Include implementation requirements, a standardized error payload example, and a DRF exception-handler snippet.
@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 30, 2026
@openedx-webhooks
Copy link
Copy Markdown

Thanks for the pull request, @taimoor-ahmed-1!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Mar 30, 2026
@taimoor-ahmed-1 taimoor-ahmed-1 changed the title docs: add ADR-0029 standardized error responses decision [FC-0118] docs: add ADR-0029 standardized error responses decision Mar 31, 2026
@taimoor-ahmed-1 taimoor-ahmed-1 self-assigned this Mar 31, 2026
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Apr 1, 2026
Comment on lines +37 to +46
Implementation requirements:

* Use appropriate HTTP status codes (4xx/5xx). Avoid returning HTTP 200 for error conditions.
* Return a consistent payload with these core fields:

* ``type`` (URI identifying the problem type)
* ``title`` (short, human-readable summary)
* ``status`` (HTTP status code)
* ``detail`` (human-readable explanation specific to this occurrence)
* ``instance`` (URI identifying this specific occurrence, when available)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there are errors in the platform where some part of the error document is taken and shown to the end user. In this case, the error message is translated, in other cases the message is not displayed to the end user. It seems like maybe it would be useful to have a field for a message that's meant to be shown to end users in MFEs and a second message that is the same consistent error text that can be used by APMs and other monitoring and logging systems.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the additional field, can you please review again

@feanil feanil requested a review from bradenmacdonald April 6, 2026 15:55
Copy link
Copy Markdown
Contributor

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great. However, the ADR doesn't mention two of the biggest problems I see with errors in the platform today:

  1. Many (most?) endpoints that are used as APIs return a standard Django HTML error response when there is a 500 error, instead of a JSON response. DRF doesn't seem to be properly catching errors. GET /api/courses/v1/migrate_legacy_content_blocks/course-v1:foo+bar+x/ is just one example but there are many. Any API endpoints that use our API classes should convert all errors to this new format.

  2. Sometimes the error responses have different CORS headers than the API itself, so when an error occurs, the browser rejects the response because it's missing CORS headers, and users see a misleading "CORS" error. If there are no security implications, error responses should be served with more open CORS policies.

@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Apr 6, 2026
@mphilbrick211 mphilbrick211 added the FC Relates to an Axim Funded Contribution project label Apr 6, 2026
@mphilbrick211 mphilbrick211 moved this from Needs Triage to In Eng Review in Contributions Apr 6, 2026
Comment on lines +83 to +87
App-specific types may extend this catalog; they must still be absolute URIs.
* Ensure **all** API endpoints return JSON error responses — including on unhandled 500 errors.
The platform-level DRF exception handler must catch exceptions that would otherwise produce
Django's default HTML error page and return a JSON body in the standardized format instead.
Endpoints not using DRF's ``APIView`` must be identified and wrapped accordingly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

.. code-block:: json

{
"type": "https://open-edx.org/errors/validation",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"type": "https://open-edx.org/errors/validation",
"type": "https://openedx.org/errors/validation",

Open-edx.org is not a real domain. (This is in many places in the ADR)

Also, should we make these real URLs that have some content there, explaining the error, or are these strictly IDs?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should link to an OpenAPI spec?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so RFC 9457 specifies this as:
"The "instance" member is a JSON string containing a URI reference that identifies the specific occurrence of the problem.
When the "instance" URI is dereferenceable, the problem details object can be fetched from it."

My observation here is that while this sounds like a great idea, it would mean we have to create a storage for information of specific errors and then a URL endpoint that returns this information.
Is this really practical or desirable? What concrete problem we have would this solve?

Copy link
Copy Markdown
Contributor

@jesperhodge jesperhodge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that we are doing this. Especially now I know that DRF does not catch some 5xx or unknown errors, this is really important.

I have a few general questions about the format and about ensuring that internal error information is not leaked outside: see my comments below.

And in general, I agree that we are best off with our own standardized format for errors that we define exactly the way we want. But could you explain a bit more why you see this as the preferable format?

In comparison for example with something like https://drf-standardized-errors.readthedocs.io/en/latest/quickstart.html

* ``type`` (URI identifying the problem type)
* ``title`` (short, human-readable summary)
* ``status`` (HTTP status code)
* ``detail`` (developer-facing explanation specific to this occurrence; stable English text, safe
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the security front, let's be aware that DRF does not catch all exceptions, for example, it does not catch 5xx exceptions or famously IntegrityErrors. That level of detail - stack traces, error message - should also not be included in this detail, unless you're in mode DEBUG=True, in my opinion. Because that would expose too sensitive data to the user.

"type": f"https://open-edx.org/errors/{_error_type(exc)}",
"title": _error_title(exc),
"status": response.status_code,
"detail": _flatten_detail(response.data),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem here because the DRF exception_handler does not catch all 5xx errors. Sending all the response data here would include error messages and stack traces.
A really important job of this function would be to catch any uncaught or unknown exceptions that are not handled by DRF, and send a standard "Internal Server Error" message.

* ``type`` (URI identifying the problem type)
* ``title`` (short, human-readable summary)
* ``status`` (HTTP status code)
* ``detail`` (developer-facing explanation specific to this occurrence; stable English text, safe
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this handle validation errors and such? Generally there's something like

"field1": {
  "errors": [...]
},
"field2": {
  "errors": [...]
}

* ``status`` (HTTP status code)
* ``detail`` (developer-facing explanation specific to this occurrence; stable English text, safe
for log aggregators and APM tools)
* ``instance`` (URI identifying this specific occurrence, when available)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this data okay to be exposed to the user?

* ``title`` (short, human-readable summary)
* ``status`` (HTTP status code)
* ``detail`` (developer-facing explanation specific to this occurrence; stable English text, safe
for log aggregators and APM tools)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why log aggregators?
Generally we should make sure to log errors and include error information, while this response is for REST APIs. So generally there is a central try/except that logs every error that makes it to this level; or if necessary, that would even be included in this API level error handling - not sure how it's currently done in edx-platform.
Either way I'm a bit confused by the mixing of both here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I looked at https://www.rfc-editor.org/rfc/rfc9457.html#name-detail a bit - apparently this is something that inspired this?
In general practice as well as in this RFC, "detail" is considered roughly a human-facing client description that an MFE for example could print to the user as an error message.
It's not supposed to have debugging information.

* Carry a **short human-readable summary** plus a **specific explanation** for this request when helpful.
* Tie errors to the **request** when useful (e.g. request path or URL) for support and logging.
* Represent **validation failures** in a consistent way (e.g. field/path to messages) instead of ad-hoc nesting.
* Are **documented and enforced** in DRF (central exception handling + schema generation).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about in OpenAPI specs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U

Projects

Status: In Eng Review

Development

Successfully merging this pull request may close these issues.

7 participants