Skip to content
View mariusargatu's full-sized avatar

Block or report mariusargatu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mariusargatu/README.md

Marius Argatu

Senior SDET  ·  AI & LLM Quality Engineering


About me

Senior SDET, 10 years building test infrastructure where reliability is non-negotiable: airline payments, multi-tenant healthcare SaaS, fintech, financial market data.

For the past few years my focus is testing AI-powered features: LLM evaluation harnesses, RAG retrieval testing, golden-dataset regression, agentic system testing. I build the frameworks and runners, and wire evaluation into CI/CD.

Full background, experience and competencies on mariusargatu.com/about

What I work on

  • AI / LLM testing: does the model answer faithfully, and can you prove it? faithfulness, answer relevancy, and hallucination scoring · RAG retrieval metrics (hit-rate, MRR, precision/recall@k) · agentic multi-turn and tool-call testing
  • Test architecture: the frameworks and runners under the tests. model-based testing (xState) · property-based testing (Schemathesis, Hypothesis, fast-check) · Pydantic schemas

From the blog

- [The Golden Dataset: Building the Oracle You Test Against](https://www.mariusargatu.com/blog/the-golden-dataset/)

Read everything on mariusargatu.com/blog

Stack

  • AI / LLM: DeepEval, RAGAS, Langfuse, Pydantic
  • Languages: Python, TypeScript
  • Test: Playwright, Pytest, Vitest
  • API and contract: GraphQL, OpenAPI
  • CI/CD and infra: GitHub Actions, Docker

Website Blog LinkedIn Email

“A test suite is a liability as much as an asset. Every test earns its place.”

Popular repositories Loading

  1. spec-to-playwright spec-to-playwright Public archive

    Python

  2. QARoom QARoom Public

    A multi-tenant social platform built to demonstrate testing-driven architecture: testability as an architectural property, not a phase, with each boundary defended by the testing technique that fit…

    TypeScript

  3. blog blog Public

    Personal site and blog — mariusargatu.com (Astro, served at /blog)

    Astro

  4. mariusargatu mariusargatu Public

    Profile README — Senior SDET / AI & LLM Quality Engineering

  5. maritime-test-lab maritime-test-lab Public

    Go

  6. Atlas Atlas Public

    Atlas: a testable broadband support agent (LangGraph + MCP). Runnable reference system for the 'Evals Are Checks, Not Tests' series.

    Python