fix(cli): handle sync/async serve functions in m serve#784
fix(cli): handle sync/async serve functions in m serve#784markstur merged 10 commits intogenerative-computing:mainfrom
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
Fixes sync/async mismatch in `m serve` by detecting function type and handling appropriately: - Async serve functions are awaited directly - Sync serve functions are wrapped in asyncio.to_thread() to prevent blocking FastAPI's event loop This ensures the server can handle concurrent requests efficiently regardless of whether user-defined serve functions are sync or async. Changes: - cli/serve/app.py: Add asyncio/inspect imports, update make_chat_endpoint() to detect coroutine functions and wrap sync functions in to_thread() - test/cli/test_serve_sync_async.py: Add comprehensive test suite (9 tests) including empirical timing test that proves non-blocking behavior Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
- temperature → ModelOption.TEMPERATURE - max_tokens → ModelOption.MAX_NEW_TOKENS - seed → ModelOption.SEED Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
With async added, need to fix the catch Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Fix for -- Error: function raises but has no Raises section Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
* filter out model, n, user, and extra * comment the filtering * use a map and ModelOption.replace_keys() for mapping * fix replace_keys to no-op with from == to Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
* Use whole seconds for timing test so it should be less flakey * Return a real ModelOutputThunk instead of a misleading Mock * Use AsyncMock with side_effect for consistency Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
Thanks @planetf1 for the review! I fixed each. Please see the comment/question about whether we really want an allowlist. I interpreted that as not the main objective there (but I have a feeling I read it with bias). Happy to address it if you think we should pursue that right now. |
|
On a first read code looks good, but Claude did find a couple items to look at: Issue 1:
|
* For n > 1 use 400 (we have not implemented n > 1 yet) * For pydantic validation errors, add a handler to convert to OpenAI API error format. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
* Add more fields to excluded_fields. Some of these should be implemented or handled better soon, but filter/ignore for now. * Move _build_model_options from nested function inside make_chat_endpoint to module level. The function has no closure dependencies and can be tested directly. * Add unit tests for _build_model_options
It turns out, me adding this bit of model options mapping in the sync/async PR was bad creep. Who knew?
Thanks! |
ajbozarth
left a comment
There was a problem hiding this comment.
One more nit from Claude otherwise LGTM
…p instance causing test pollution Created a fresh FastAPI() instance for the test Added RequestValidationError import from fastapi.exceptions Manually registered the validation_exception_handler to maintain the same behavior Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Misc PR
Type of PR
Description
Testing