feat: retrival eval add open-eval mode#439
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an LLM-as-Judge 'Open Eval' phase (Exa-style pointwise grading) to the retrieval executor, supporting both standalone evaluation on custom queries and post-MTEB evaluation. Key additions include configuration schemas, CLI options, the LLMSearchResultRelevance grader, and corresponding unit tests. The review feedback highlights several critical and medium-severity issues: a concurrency bug in _run_open_eval where out-of-order thread completion misaligns grades with queries, ignoring the user-configured aggregation method for the summary score, potential parsing crashes on malformed LLM JSON responses, issues with OpenAI client initialization when api_key is None, and unhandled aggregation methods like 'ndcg'.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
No description provided.