Skip to content

Add AI Evaluation#3923

Open
Vvkmnn wants to merge 1 commit intosindresorhus:mainfrom
Vvkmnn:add-eval
Open

Add AI Evaluation#3923
Vvkmnn wants to merge 1 commit intosindresorhus:mainfrom
Vvkmnn:add-eval

Conversation

@Vvkmnn
Copy link
Copy Markdown

@Vvkmnn Vvkmnn commented Feb 4, 2026

https://github.com/Vvkmnn/awesome-ai-eval#readme

Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents in production environments.

PRs reviewed:

By submitting this pull request I confirm I've read and complied with the below requirements 🖖

Requirements for your pull request

  • Don't open a Draft / WIP pull request while you work on the guidelines. A pull request should be 100% ready and should adhere to all the guidelines when you open it.
  • Don't waste my time. Do a good job, adhere to all the guidelines, and be responsive.
  • You have to review at least 2 other open pull requests.
  • You have read and understood the instructions for creating a list.
  • This pull request has a title in the format Add Name of List. It should not contain the word Awesome.
  • Your entry here should include a short description of the project/theme of the list. It should not describe the list itself.
  • Your entry should be added at the bottom of the appropriate category.
  • The title of your entry should be title-cased and the URL to your list should end in #readme.
  • No blockchain-related lists.
  • The suggested Awesome list complies with the below requirements.

Requirements for your Awesome list

  • Has been around for at least 30 days. (Created November 18, 2025 — 78+ days ago)
  • Run awesome-lint on your list and fix the reported issues. ✔ Linting passed.
  • The default branch should be named main, not master.
  • Includes a succinct description of the project/theme at the top of the readme.
  • It's the result of hard work and the best I could possibly produce.
  • The repo name of your list should be in lowercase slug format: awesome-ai-eval.
  • The heading title of your list should be in title case format: # Awesome AI Eval.
  • Non-generated Markdown file in a GitHub repo.
  • The repo should have awesome-list & awesome as GitHub topics.
  • Not a duplicate. Please search for existing submissions.
  • Only has awesome items. Awesome lists are curations of the best, not everything.
  • Does not contain items that are unmaintained, has archived repo, deprecated, or missing docs.
  • Includes a project logo/illustration.
  • Entries have a description.
  • Includes the Awesome badge.
  • Has a Table of Contents section named Contents.
  • Has an appropriate license. (CC0)
  • Has contribution guidelines.
  • Has consistent formatting and proper spelling/grammar.
  • Does not use hard-wrapping.
  • Does not include a CI badge.
  • Does not include an "Inspired by awesome-foo" or "Inspired by the Awesome project" kind of link at the top of the readme.

@Vvkmnn
Copy link
Copy Markdown
Author

Vvkmnn commented Feb 4, 2026

unicorn 🦄

@QDenka
Copy link
Copy Markdown

QDenka commented Feb 11, 2026

Licensing section must be removed from the readme. The guidelines are explicit: "Do not add the license name, text, or a Licence section to the readme. GitHub already shows the license name and link to the full text at the top of the repo."
Your readme currently ends with:

Licensing
Released under the CC0 1.0 Universal license.

This entire section should be deleted.

@QDenka QDenka mentioned this pull request Feb 12, 2026
33 tasks
@Vvkmnn
Copy link
Copy Markdown
Author

Vvkmnn commented Feb 12, 2026

Licensing section must be removed from the readme. The guidelines are explicit: "Do not add the license name, text, or a Licence section to the readme. GitHub already shows the license name and link to the full text at the top of the repo." Your readme currently ends with:

Licensing
Released under the CC0 1.0 Universal license.

This entire section should be deleted.

Thanks, fixed.

@be-next
Copy link
Copy Markdown

be-next commented Feb 16, 2026

The topic fills a genuine gap in the awesome ecosystem — no existing list focuses specifically on AI/LLM evaluation. The content depth is impressive. A few issues to address before this can be merged:

Non-standard list item format. Every entry uses an inline shields.io badge prefix and bold link text:

- ![](https://img.shields.io/github/stars/confident-ai/deepeval?style=social) [**DeepEval**](…) - Description.

The required awesome format is simply:

- [DeepEval](…) - Description.

Inline badges in list items fall in the same category as CI badges — they add visual noise, are a maintenance burden (star counts change constantly), and break the consistent formatting expected across awesome lists. All ~130 entries would need to be reformatted.

Dead links. A few entries point to repositories that return 404 — at minimum MetaTool (meta-llama/MetaTool) and RAGTruth (zhengzangw/RAGTruth) no longer exist. These should be removed.

Entry placement. The diff shows the entry inserted between "AI in Finance" and "JAX" (alphabetically) rather than at the bottom of the Machine Learning sub-section, as required by the checklist.

Horizontal rules. The --- separators between sections will likely trigger awesome-lint warnings. The convention is to rely on heading-based separation only.

Minor: The tagline uses "A curated list of…" which is redundant for an awesome list. Also, the & vs. "and" in headings is inconsistent ("Prompt Evaluation & Safety" but "Guides & Training" vs. "Community and Conferences" elsewhere).

Copy link
Copy Markdown

@gabrielgames5998-max gabrielgames5998-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrigir

Copy link
Copy Markdown

@AndrejOrsula AndrejOrsula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a relevant topic, and your list provides great depth. While the project and repository activity/star badges are a nice visual addition and I personally favor them, they unfortunately go against the simple formatting guidelines required for list items. Additionally, the description at the very top of your README.md says it is a curated list, while the guidelines state that you should only describe the topic and not the list itself. Finally, please make sure your entry in the main repository pull request is placed at the very bottom of the category instead of the middle. Thank you.

Comment thread readme.md
- [H2O](https://github.com/h2oai/awesome-h2o#readme) - Open source distributed machine learning platform written in Java with APIs in R, Python, and Scala.
- [Software Engineering for Machine Learning](https://github.com/SE-ML/awesome-seml#readme) - From experiment to production-level machine learning.
- [AI in Finance](https://github.com/georgezouq/awesome-ai-in-finance#readme) - Solving problems in finance with machine learning.
- [AI Evaluation](https://github.com/Vvkmnn/awesome-ai-eval#readme) - Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the requirement that you ticked but did not meet:

Your entry should be added at the bottom of the appropriate category.

@AndrejOrsula AndrejOrsula mentioned this pull request Mar 5, 2026
33 tasks
@joseluisantiguarosa73-debug
Copy link
Copy Markdown

@rosaboyle
Copy link
Copy Markdown

LMArena is renamed as Arena.ai. https://arena.ai/blog/lmarena-is-now-arena/

Otherwise most of the things look good. Also, really cool and helpful list. Will add few more to the list.

@Vvkmnn
Copy link
Copy Markdown
Author

Vvkmnn commented Mar 17, 2026

LMArena is renamed as Arena.ai. https://arena.ai/blog/lmarena-is-now-arena/

Otherwise most of the things look good. Also, really cool and helpful list. Will add few more to the list.

Got it, thanks for catching. Would love any and all PRs, looking forward to it.

@yesw2000
Copy link
Copy Markdown

yesw2000 commented Apr 1, 2026

@yesw2000 yesw2000 mentioned this pull request Apr 1, 2026
31 tasks
@sindresorhus
Copy link
Copy Markdown
Owner

  • Every entry has an inline shields.io star-count badge. All ~130 entries use the format:

    - ![](https://img.shields.io/github/stars/org/repo?style=social) [**Name**](url) - Description.
    

    The required format is simply - [Name](url) - Description. Inline badges add visual noise, star counts go stale constantly, and the bold link formatting is non-standard. All entries need to be reformatted. This has been flagged on the PR and not fixed.

  • Entry not at the bottom of the category. The diff inserts the entry between "AI in Finance" and "JAX" (alphabetically) rather than at the bottom of the Machine Learning sub-section. This was flagged by two reviewers and a formal changes-requested review. Still not fixed.

  • 8+ broken links. These URLs return 404 or are otherwise dead:

    • meta-llama/MetaTool
    • zhengzangw/RAGTruth
    • lunary-ai/lunary
    • showlab/VisualToolBench
    • eugeneyan/genai-notes
    • blog.jetbrains.com/ai/2025/10/dpai-arena/
    • llmbench.ai
    • simplebench.ai
  • Top description describes the list. > A curated list of tools, methods & platforms for evaluating AI quality in real applications. starts with "A curated list of..." Describe the subject, e.g.: "Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents."

  • Logo not linked. <img src="./assets/robot-shades.svg" align="right" width="150"> is not wrapped in an <a> tag. Guidelines: "The image should link to the project website or any relevant website."

  • Stale entry: LMArena renamed to Arena.ai. See https://arena.ai/blog/lmarena-is-now-arena/. Update the name and URL.

  • Heading naming inconsistency. Some headings use & ("Prompt Evaluation & Safety", "Guides & Training") while others use "and" ("Community and Conferences"). Pick one style.

  • Horizontal rules. The --- separators between sections are non-standard for awesome lists.

  • Deep heading nesting. Sections use #### sub-headings (e.g., "#### Core Frameworks" under "### Evaluators and Test Harnesses"), creating 4 levels of hierarchy. Awesome lists should keep structure flat.

@agamm
Copy link
Copy Markdown
Contributor

agamm commented Apr 8, 2026

Dead links (404):

@agamm agamm mentioned this pull request Apr 8, 2026
31 tasks
@carloshvp carloshvp mentioned this pull request Apr 12, 2026
34 tasks
@carloshvp
Copy link
Copy Markdown

Reviewed Vvkmnn/awesome-ai-eval.

What looks good:

  • CC0-1.0 license ✅
  • awesome and awesome-list topics set ✅
  • Created November 2025, 140+ days old ✅
  • Logo present ✅
  • Entries have descriptions ending in a period ✅
  • No CI badge ✅

Issues to address:

  1. Only 2 PRs reviewed, not 4: The current PR template requires reviewing at least 4 open pull requests (the template was updated). Your PR body lists only 2 reviewed PRs (#3831 and #3919). Please review 2 more and update the PR body with links to your comments.

  2. Top-level readme description describes the list, not the subject: The description reads "Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents in production environments." — while this is close, it still reads as describing what the list is for rather than what the subject is. The subject is AI evaluation as a discipline. Something like "Methods and tools for assessing the reliability, accuracy, and safety of large language models and AI agents." would be cleaner.

  3. Contributing in ToC: The readme includes a Contributing section in the Table of Contents. Per guidelines, Contributing must not appear in the ToC.

Good topic with clear demand — worth fixing these before it'll be accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.