Add AI Evaluation by Vvkmnn · Pull Request #3923 · sindresorhus/awesome

Vvkmnn · 2026-02-04T05:10:23Z

https://github.com/Vvkmnn/awesome-ai-eval#readme

Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents in production environments.

PRs reviewed:

By submitting this pull request I confirm I've read and complied with the below requirements 🖖

Requirements for your pull request

Requirements for your Awesome list

Vvkmnn · 2026-02-04T05:11:04Z

unicorn 🦄

QDenka · 2026-02-11T14:27:04Z

Licensing section must be removed from the readme. The guidelines are explicit: "Do not add the license name, text, or a Licence section to the readme. GitHub already shows the license name and link to the full text at the top of the repo."
Your readme currently ends with:

Licensing
Released under the CC0 1.0 Universal license.

This entire section should be deleted.

Vvkmnn · 2026-02-12T10:29:04Z

Licensing section must be removed from the readme. The guidelines are explicit: "Do not add the license name, text, or a Licence section to the readme. GitHub already shows the license name and link to the full text at the top of the repo." Your readme currently ends with:
Licensing
Released under the CC0 1.0 Universal license.
This entire section should be deleted.

Thanks, fixed.

be-next · 2026-02-16T13:39:19Z

The topic fills a genuine gap in the awesome ecosystem — no existing list focuses specifically on AI/LLM evaluation. The content depth is impressive. A few issues to address before this can be merged:

Non-standard list item format. Every entry uses an inline shields.io badge prefix and bold link text:

- ![](https://img.shields.io/github/stars/confident-ai/deepeval?style=social) [**DeepEval**](…) - Description.

The required awesome format is simply:

- [DeepEval](…) - Description.

Inline badges in list items fall in the same category as CI badges — they add visual noise, are a maintenance burden (star counts change constantly), and break the consistent formatting expected across awesome lists. All ~130 entries would need to be reformatted.

Dead links. A few entries point to repositories that return 404 — at minimum MetaTool (meta-llama/MetaTool) and RAGTruth (zhengzangw/RAGTruth) no longer exist. These should be removed.

Entry placement. The diff shows the entry inserted between "AI in Finance" and "JAX" (alphabetically) rather than at the bottom of the Machine Learning sub-section, as required by the checklist.

Horizontal rules. The --- separators between sections will likely trigger awesome-lint warnings. The convention is to rely on heading-based separation only.

Minor: The tagline uses "A curated list of…" which is redundant for an awesome list. Also, the & vs. "and" in headings is inconsistent ("Prompt Evaluation & Safety" but "Guides & Training" vs. "Community and Conferences" elsewhere).

gabrielgames5998-max

Corrigir

AndrejOrsula

This is a relevant topic, and your list provides great depth. While the project and repository activity/star badges are a nice visual addition and I personally favor them, they unfortunately go against the simple formatting guidelines required for list items. Additionally, the description at the very top of your README.md says it is a curated list, while the guidelines state that you should only describe the topic and not the list itself. Finally, please make sure your entry in the main repository pull request is placed at the very bottom of the category instead of the middle. Thank you.

AndrejOrsula · 2026-03-05T12:46:53Z

 	- [H2O](https://github.com/h2oai/awesome-h2o#readme) - Open source distributed machine learning platform written in Java with APIs in R, Python, and Scala.
 	- [Software Engineering for Machine Learning](https://github.com/SE-ML/awesome-seml#readme) - From experiment to production-level machine learning.
 	- [AI in Finance](https://github.com/georgezouq/awesome-ai-in-finance#readme) - Solving problems in finance with machine learning.
+	- [AI Evaluation](https://github.com/Vvkmnn/awesome-ai-eval#readme) - Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents.


As per the requirement that you ticked but did not meet:

Your entry should be added at the bottom of the appropriate category.

joseluisantiguarosa73-debug · 2026-03-05T15:53:38Z

Joseluisantigusrosa73

…

On Thursday, March 5, 2026, Andrej Orsula ***@***.***> wrote: ***@***.**** requested changes on this pull request. This is a relevant topic, and your list provides great depth. While the project and repository activity/star badges are a nice visual addition and I personally favor them, they unfortunately go against the simple formatting guidelines required for list items. Additionally, the description at the very top of your README.md says it is a curated list, while the guidelines state that you should only describe the topic and not the list itself. Finally, please make sure your entry in the main repository pull request is placed at the very bottom of the category instead of the middle. Thank you. ------------------------------ In readme.md <#3923 (comment)> : > @@ -416,6 +416,7 @@ - [H2O](https://github.com/h2oai/awesome-h2o#readme) - Open source distributed machine learning platform written in Java with APIs in R, Python, and Scala. - [Software Engineering for Machine Learning](https://github.com/SE-ML/awesome-seml#readme) - From experiment to production-level machine learning. - [AI in Finance](https://github.com/georgezouq/awesome-ai-in-finance#readme) - Solving problems in finance with machine learning. + - [AI Evaluation](https://github.com/Vvkmnn/awesome-ai-eval#readme) - Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents. As per the requirement that you ticked but did not meet: Your entry should be added at the bottom of the appropriate category. — Reply to this email directly, view it on GitHub <#3923 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/B637QFD76KXZ6YC343HYCHD4PF3RLAVCNFSM6AAAAACT5DHUASVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTQOJWGIZTONRTGU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

rosaboyle · 2026-03-10T13:17:09Z

LMArena is renamed as Arena.ai. https://arena.ai/blog/lmarena-is-now-arena/

Otherwise most of the things look good. Also, really cool and helpful list. Will add few more to the list.

Vvkmnn · 2026-03-17T05:08:19Z

LMArena is renamed as Arena.ai. https://arena.ai/blog/lmarena-is-now-arena/

Otherwise most of the things look good. Also, really cool and helpful list. Will add few more to the list.

Got it, thanks for catching. Would love any and all PRs, looking forward to it.

yesw2000 · 2026-04-01T15:53:58Z

There are a few broken links:

sindresorhus · 2026-04-01T21:06:54Z

Every entry has an inline shields.io star-count badge. All ~130 entries use the format:
```
- ![](https://img.shields.io/github/stars/org/repo?style=social) [**Name**](url) - Description.
```
The required format is simply - [Name](url) - Description. Inline badges add visual noise, star counts go stale constantly, and the bold link formatting is non-standard. All entries need to be reformatted. This has been flagged on the PR and not fixed.
Entry not at the bottom of the category. The diff inserts the entry between "AI in Finance" and "JAX" (alphabetically) rather than at the bottom of the Machine Learning sub-section. This was flagged by two reviewers and a formal changes-requested review. Still not fixed.
8+ broken links. These URLs return 404 or are otherwise dead:
- meta-llama/MetaTool
- zhengzangw/RAGTruth
- lunary-ai/lunary
- showlab/VisualToolBench
- eugeneyan/genai-notes
- blog.jetbrains.com/ai/2025/10/dpai-arena/
- llmbench.ai
- simplebench.ai
Top description describes the list. > A curated list of tools, methods & platforms for evaluating AI quality in real applications. starts with "A curated list of..." Describe the subject, e.g.: "Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents."
Logo not linked. <img src="./assets/robot-shades.svg" align="right" width="150"> is not wrapped in an <a> tag. Guidelines: "The image should link to the project website or any relevant website."
Stale entry: LMArena renamed to Arena.ai. See https://arena.ai/blog/lmarena-is-now-arena/. Update the name and URL.
Heading naming inconsistency. Some headings use & ("Prompt Evaluation & Safety", "Guides & Training") while others use "and" ("Community and Conferences"). Pick one style.
Horizontal rules. The --- separators between sections are non-standard for awesome lists.
Deep heading nesting. Sections use #### sub-headings (e.g., "#### Core Frameworks" under "### Evaluators and Test Harnesses"), creating 4 levels of hierarchy. Awesome lists should keep structure flat.

agamm · 2026-04-08T02:58:48Z

Dead links (404):

RAGTruth — https://github.com/zhengzangw/RAGTruth (in RAG Datasets and Surveys)

MetaTool Tasks — https://github.com/meta-llama/MetaTool (in Benchmarks > Agent)

GenAI Notes — https://github.com/eugeneyan/genai-notes (in Resources > Related Collections)

carloshvp · 2026-04-12T21:41:19Z

Reviewed Vvkmnn/awesome-ai-eval.

What looks good:

CC0-1.0 license ✅
awesome and awesome-list topics set ✅
Created November 2025, 140+ days old ✅
Logo present ✅
Entries have descriptions ending in a period ✅
No CI badge ✅

Issues to address:

Only 2 PRs reviewed, not 4: The current PR template requires reviewing at least 4 open pull requests (the template was updated). Your PR body lists only 2 reviewed PRs (#3831 and #3919). Please review 2 more and update the PR body with links to your comments.
Top-level readme description describes the list, not the subject: The description reads "Measuring reliability, accuracy, and safety of LLMs, RAG pipelines, and AI agents in production environments." — while this is close, it still reads as describing what the list is for rather than what the subject is. The subject is AI evaluation as a discipline. Something like "Methods and tools for assessing the reliability, accuracy, and safety of large language models and AI agents." would be cleaner.
Contributing in ToC: The readme includes a Contributing section in the Table of Contents. Per guidelines, Contributing must not appear in the ToC.

Good topic with clear demand — worth fixing these before it'll be accepted.

Add AI Evaluation

d138ea6

gabrielgames5998-max approved these changes Feb 4, 2026

View reviewed changes

QDenka mentioned this pull request Feb 12, 2026

Add Software Design #3940

Open

33 tasks

gabrielgames5998-max approved these changes Feb 26, 2026

View reviewed changes

gabrielgames5998-max reviewed Feb 26, 2026

View reviewed changes

gabrielgames5998-max approved these changes Feb 26, 2026

View reviewed changes

AndrejOrsula suggested changes Mar 5, 2026

View reviewed changes

AndrejOrsula mentioned this pull request Mar 5, 2026

Add Space Robotics #3981

Open

33 tasks

yesw2000 mentioned this pull request Apr 1, 2026

Add Terminals AI #3672

Open

31 tasks

agamm mentioned this pull request Apr 8, 2026

Add AI SRE #4083

Closed

31 tasks

carloshvp mentioned this pull request Apr 12, 2026

Add EU AI Act #4095

Open

34 tasks

Uh oh!

Conversation

Vvkmnn commented Feb 4, 2026

By submitting this pull request I confirm I've read and complied with the below requirements 🖖

Requirements for your pull request

Requirements for your Awesome list

Uh oh!

Vvkmnn commented Feb 4, 2026

Uh oh!

QDenka commented Feb 11, 2026

Uh oh!

Vvkmnn commented Feb 12, 2026

Uh oh!

be-next commented Feb 16, 2026

Uh oh!

gabrielgames5998-max left a comment

Choose a reason for hiding this comment

Uh oh!

AndrejOrsula left a comment

Choose a reason for hiding this comment

Uh oh!

AndrejOrsula Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

joseluisantiguarosa73-debug commented Mar 5, 2026 via email

Uh oh!

rosaboyle commented Mar 10, 2026

Uh oh!

Vvkmnn commented Mar 17, 2026

Uh oh!

yesw2000 commented Apr 1, 2026

Uh oh!

sindresorhus commented Apr 1, 2026

Uh oh!

agamm commented Apr 8, 2026

Uh oh!

carloshvp commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants