Skip to content

fix(cli): handle sync/async serve functions in m serve#784

Merged
markstur merged 10 commits intogenerative-computing:mainfrom
markstur:fix_async
Apr 9, 2026
Merged

fix(cli): handle sync/async serve functions in m serve#784
markstur merged 10 commits intogenerative-computing:mainfrom
markstur:fix_async

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented Apr 2, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

fix(cli): handle sync/async serve functions in m serve

Fixes sync/async mismatch in `m serve` by detecting function type and
handling appropriately:
- Async serve functions are awaited directly
- Sync serve functions are wrapped in asyncio.to_thread() to prevent
  blocking FastAPI's event loop

This ensures the server can handle concurrent requests efficiently
regardless of whether user-defined serve functions are sync or async.

Changes:
- cli/serve/app.py: Add asyncio/inspect imports, update make_chat_endpoint()
  to detect coroutine functions and wrap sync functions in to_thread()
- test/cli/test_serve_sync_async.py: Add comprehensive test suite (9 tests)
  including empirical timing test that proves non-blocking behavior

feat: Map OpenAI parameters to ModelOption sentinels:

   - temperature → ModelOption.TEMPERATURE
   - max_tokens → ModelOption.MAX_NEW_TOKENS
   - seed → ModelOption.SEED


Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@markstur markstur requested a review from a team as a code owner April 2, 2026 23:18
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@markstur markstur changed the title Fix async fix(cli): handle sync/async serve functions in m serve Apr 2, 2026
@github-actions github-actions bot added the bug Something isn't working label Apr 2, 2026
@markstur markstur marked this pull request as draft April 2, 2026 23:41
@markstur markstur marked this pull request as ready for review April 3, 2026 15:32
markstur added 4 commits April 3, 2026 10:29
Fixes sync/async mismatch in `m serve` by detecting function type and
handling appropriately:
- Async serve functions are awaited directly
- Sync serve functions are wrapped in asyncio.to_thread() to prevent
  blocking FastAPI's event loop

This ensures the server can handle concurrent requests efficiently
regardless of whether user-defined serve functions are sync or async.

Changes:
- cli/serve/app.py: Add asyncio/inspect imports, update make_chat_endpoint()
  to detect coroutine functions and wrap sync functions in to_thread()
- test/cli/test_serve_sync_async.py: Add comprehensive test suite (9 tests)
  including empirical timing test that proves non-blocking behavior

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
   - temperature → ModelOption.TEMPERATURE
   - max_tokens → ModelOption.MAX_NEW_TOKENS
   - seed → ModelOption.SEED

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
With async added, need to fix the catch

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Fix for -- Error: function raises but has no Raises section

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

* filter out model, n, user, and extra
* comment the filtering
* use a map and ModelOption.replace_keys() for mapping
* fix replace_keys to no-op with from == to

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur requested a review from a team as a code owner April 8, 2026 00:06
* Use whole seconds for timing test so it should be less flakey
* Return a real ModelOutputThunk instead of a misleading Mock
* Use AsyncMock with side_effect for consistency

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur
Copy link
Copy Markdown
Contributor Author

markstur commented Apr 8, 2026

Thanks @planetf1 for the review! I fixed each. Please see the comment/question about whether we really want an allowlist. I interpreted that as not the main objective there (but I have a feeling I read it with bias). Happy to address it if you think we should pursue that right now.

@ajbozarth
Copy link
Copy Markdown
Contributor

On a first read code looks good, but Claude did find a couple items to look at:

Issue 1: None values are mapped to sentinel keys

_build_model_options uses model_dump() which includes all fields, including those set to None. Fields like seed: None and max_tokens: None (unset by the caller) become @@@seed@@@: None and @@@max_new_tokens@@@: None in the output.

Backends that do if ModelOption.SEED in model_options: will see them. Depending on backend implementation, passing None at a sentinel key could cause unexpected behavior or override a backend default.

Suggestion: Filter out None values:

filtered_options = {
    key: value
    for key, value in request.model_dump().items()
    if key not in excluded_fields and value is not None
}

Issue 2: functions/tools/tool_choice etc. pass through unfiltered

ChatCompletionRequest has several OpenAI-specific fields (functions, tools, function_call, tool_choice, response_format, logit_bias, etc.) that aren't in excluded_fields. They'll land in model_options as None for most requests (or with real values for tool-using callers).

Pre-existing behavior, yes — but _build_model_options is a good opportunity to clean this up. At minimum these should probably be added to excluded_fields, since Mellea backends don't consume raw OpenAI tool schemas via model_options.


Minor: nested helper could be module-level

_build_model_options has no closure over anything in make_chat_endpoint. Moving it to module level would make it easier to test directly (there are currently no direct unit tests for it).

markstur added 3 commits April 8, 2026 17:56
* For n > 1 use 400 (we have not implemented n > 1 yet)
* For pydantic validation errors, add a handler to convert to OpenAI API error format.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
* Add more fields to excluded_fields. Some of these should be implemented or handled better soon, but filter/ignore for now.
* Move _build_model_options from nested function inside make_chat_endpoint to module level.
  The function has no closure dependencies and can be tested directly.
* Add unit tests for _build_model_options
@markstur
Copy link
Copy Markdown
Contributor Author

markstur commented Apr 9, 2026

Suggestion: Filter out None values:
Added exclude_none=True parameter to model_dump

Issue 2: functions/tools/tool_choice etc. pass through unfiltered

  • I added special handling for n > 1 because I noticed @planetf1 called that out in the issue.
  • I also added a handler to turn that and pydantic validation errors into OpenAI API errors.
  • But with I looked at doing similar handling for all the other params is seemed like a lot so I just added them to excluded_fields like you suggested.

It turns out, me adding this bit of model options mapping in the sync/async PR was bad creep. Who knew?
So excluded is probably a good start and I will start implementing and/or handling them better in next PR(s).

Minor: nested helper could be module-level

_build_model_options has no closure over anything in make_chat_endpoint. Moving it to module level would make it easier to test directly (there are currently no direct unit tests for it).
Done and unit tests added.

Thanks!

@markstur markstur requested a review from planetf1 April 9, 2026 01:56
@markstur markstur enabled auto-merge April 9, 2026 01:56
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more nit from Claude otherwise LGTM

…p instance causing test pollution

Created a fresh FastAPI() instance for the test
Added RequestValidationError import from fastapi.exceptions
Manually registered the validation_exception_handler to maintain the same behavior

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
@markstur markstur requested a review from ajbozarth April 9, 2026 19:12
@markstur markstur added this pull request to the merge queue Apr 9, 2026
Merged via the queue into generative-computing:main with commit 55b6b73 Apr 9, 2026
6 checks passed
@markstur markstur deleted the fix_async branch April 9, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants