fix(cli): handle sync/async serve functions in m serve by markstur · Pull Request #784 · generative-computing/mellea

markstur · 2026-04-02T23:18:37Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Part of: Create OpenAI API-compatible HTTP interface for mellea #521

fix(cli): handle sync/async serve functions in m serve

Fixes sync/async mismatch in `m serve` by detecting function type and
handling appropriately:
- Async serve functions are awaited directly
- Sync serve functions are wrapped in asyncio.to_thread() to prevent
  blocking FastAPI's event loop

This ensures the server can handle concurrent requests efficiently
regardless of whether user-defined serve functions are sync or async.

Changes:
- cli/serve/app.py: Add asyncio/inspect imports, update make_chat_endpoint()
  to detect coroutine functions and wrap sync functions in to_thread()
- test/cli/test_serve_sync_async.py: Add comprehensive test suite (9 tests)
  including empirical timing test that proves non-blocking behavior

feat: Map OpenAI parameters to ModelOption sentinels:

   - temperature → ModelOption.TEMPERATURE
   - max_tokens → ModelOption.MAX_NEW_TOKENS
   - seed → ModelOption.SEED


Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

github-actions · 2026-04-02T23:18:50Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

Fixes sync/async mismatch in `m serve` by detecting function type and handling appropriately: - Async serve functions are awaited directly - Sync serve functions are wrapped in asyncio.to_thread() to prevent blocking FastAPI's event loop This ensures the server can handle concurrent requests efficiently regardless of whether user-defined serve functions are sync or async. Changes: - cli/serve/app.py: Add asyncio/inspect imports, update make_chat_endpoint() to detect coroutine functions and wrap sync functions in to_thread() - test/cli/test_serve_sync_async.py: Add comprehensive test suite (9 tests) including empirical timing test that proves non-blocking behavior Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

- temperature → ModelOption.TEMPERATURE - max_tokens → ModelOption.MAX_NEW_TOKENS - seed → ModelOption.SEED Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

With async added, need to fix the catch Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Fix for -- Error: function raises but has no Raises section Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

planetf1

see comments

cli/serve/app.py

mellea/helpers/async_helpers.py

test/cli/test_serve_sync_async.py

* filter out model, n, user, and extra * comment the filtering * use a map and ModelOption.replace_keys() for mapping * fix replace_keys to no-op with from == to Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

* Use whole seconds for timing test so it should be less flakey * Return a real ModelOutputThunk instead of a misleading Mock * Use AsyncMock with side_effect for consistency Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur · 2026-04-08T00:58:24Z

Thanks @planetf1 for the review! I fixed each. Please see the comment/question about whether we really want an allowlist. I interpreted that as not the main objective there (but I have a feeling I read it with bias). Happy to address it if you think we should pursue that right now.

ajbozarth · 2026-04-08T18:30:25Z

On a first read code looks good, but Claude did find a couple items to look at:

Issue 1: `None` values are mapped to sentinel keys

_build_model_options uses model_dump() which includes all fields, including those set to None. Fields like seed: None and max_tokens: None (unset by the caller) become @@@seed@@@: None and @@@max_new_tokens@@@: None in the output.

Backends that do if ModelOption.SEED in model_options: will see them. Depending on backend implementation, passing None at a sentinel key could cause unexpected behavior or override a backend default.

Suggestion: Filter out None values:

filtered_options = {
    key: value
    for key, value in request.model_dump().items()
    if key not in excluded_fields and value is not None
}

Issue 2: `functions`/`tools`/`tool_choice` etc. pass through unfiltered

ChatCompletionRequest has several OpenAI-specific fields (functions, tools, function_call, tool_choice, response_format, logit_bias, etc.) that aren't in excluded_fields. They'll land in model_options as None for most requests (or with real values for tool-using callers).

Pre-existing behavior, yes — but _build_model_options is a good opportunity to clean this up. At minimum these should probably be added to excluded_fields, since Mellea backends don't consume raw OpenAI tool schemas via model_options.

Minor: nested helper could be module-level

_build_model_options has no closure over anything in make_chat_endpoint. Moving it to module level would make it easier to test directly (there are currently no direct unit tests for it).

* For n > 1 use 400 (we have not implemented n > 1 yet) * For pydantic validation errors, add a handler to convert to OpenAI API error format. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

* Add more fields to excluded_fields. Some of these should be implemented or handled better soon, but filter/ignore for now. * Move _build_model_options from nested function inside make_chat_endpoint to module level. The function has no closure dependencies and can be tested directly. * Add unit tests for _build_model_options

markstur · 2026-04-09T01:53:53Z

Suggestion: Filter out None values:
Added exclude_none=True parameter to model_dump

Issue 2: functions/tools/tool_choice etc. pass through unfiltered

I added special handling for n > 1 because I noticed @planetf1 called that out in the issue.
I also added a handler to turn that and pydantic validation errors into OpenAI API errors.
But with I looked at doing similar handling for all the other params is seemed like a lot so I just added them to excluded_fields like you suggested.

It turns out, me adding this bit of model options mapping in the sync/async PR was bad creep. Who knew?
So excluded is probably a good start and I will start implementing and/or handling them better in next PR(s).

Minor: nested helper could be module-level

_build_model_options has no closure over anything in make_chat_endpoint. Moving it to module level would make it easier to test directly (there are currently no direct unit tests for it).
Done and unit tests added.

Thanks!

ajbozarth

One more nit from Claude otherwise LGTM

test/cli/test_serve.py

…p instance causing test pollution Created a fresh FastAPI() instance for the test Added RequestValidationError import from fastapi.exceptions Manually registered the validation_exception_handler to maintain the same behavior Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur requested a review from a team as a code owner April 2, 2026 23:18

markstur changed the title ~~Fix async~~ fix(cli): handle sync/async serve functions in m serve Apr 2, 2026

github-actions bot added the bug Something isn't working label Apr 2, 2026

markstur marked this pull request as draft April 2, 2026 23:41

markstur marked this pull request as ready for review April 3, 2026 15:32

markstur added 4 commits April 3, 2026 10:29

feat: Map OpenAI parameters to ModelOption sentinels:

a2c1958

- temperature → ModelOption.TEMPERATURE - max_tokens → ModelOption.MAX_NEW_TOKENS - seed → ModelOption.SEED Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

fix: catch BaseException to handle StopAsyncIteration

73a040f

With async added, need to fix the catch Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

fix: function docstring for raises but has no Raises section

f917d24

Fix for -- Error: function raises but has no Raises section Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur force-pushed the fix_async branch from f14bc7d to f917d24 Compare April 3, 2026 18:11

planetf1 reviewed Apr 7, 2026

View reviewed changes

fix: more mapping and filtering of model options in m serve app

ad9e72f

* filter out model, n, user, and extra * comment the filtering * use a map and ModelOption.replace_keys() for mapping * fix replace_keys to no-op with from == to Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur requested a review from a team as a code owner April 8, 2026 00:06

markstur requested review from AngeloDanducci and ajbozarth April 8, 2026 00:06

fix: improve tests per review

9fc4444

* Use whole seconds for timing test so it should be less flakey * Return a real ModelOutputThunk instead of a misleading Mock * Use AsyncMock with side_effect for consistency Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur added 3 commits April 8, 2026 17:56

fix: make m serve errors OpenAI API compatible

2ba4703

* For n > 1 use 400 (we have not implemented n > 1 yet) * For pydantic validation errors, add a handler to convert to OpenAI API error format. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

fix: filter out params set to None in m serve model options

b1c80b4

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur requested a review from planetf1 April 9, 2026 01:56

markstur enabled auto-merge April 9, 2026 01:56

ajbozarth reviewed Apr 9, 2026

View reviewed changes

test/cli/test_serve.py Outdated Show resolved Hide resolved

markstur requested a review from ajbozarth April 9, 2026 19:12

ajbozarth approved these changes Apr 9, 2026

View reviewed changes

markstur added this pull request to the merge queue Apr 9, 2026

Merged via the queue into generative-computing:main with commit 55b6b73 Apr 9, 2026
6 checks passed

markstur deleted the fix_async branch April 9, 2026 21:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): handle sync/async serve functions in m serve#784

fix(cli): handle sync/async serve functions in m serve#784
markstur merged 10 commits intogenerative-computing:mainfrom
markstur:fix_async

markstur commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

planetf1 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markstur commented Apr 8, 2026

Uh oh!

ajbozarth commented Apr 8, 2026

Uh oh!

markstur commented Apr 9, 2026

Issue 2: `functions`/`tools`/`tool_choice` etc. pass through unfiltered

Minor: nested helper could be module-level

Uh oh!

ajbozarth left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

markstur commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markstur commented Apr 8, 2026

Uh oh!

ajbozarth commented Apr 8, 2026

Issue 1: None values are mapped to sentinel keys

Issue 2: functions/tools/tool_choice etc. pass through unfiltered

Minor: nested helper could be module-level

Uh oh!

markstur commented Apr 9, 2026

Issue 2: functions/tools/tool_choice etc. pass through unfiltered

Minor: nested helper could be module-level

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markstur commented Apr 2, 2026 •

edited

Loading

Issue 1: `None` values are mapped to sentinel keys

Issue 2: `functions`/`tools`/`tool_choice` etc. pass through unfiltered

Issue 2: `functions`/`tools`/`tool_choice` etc. pass through unfiltered