Senior SDET · AI & LLM Quality Engineering
Senior SDET, 10 years building test infrastructure where reliability is non-negotiable: airline payments, multi-tenant healthcare SaaS, fintech, financial market data.
For the past few years my focus is testing AI-powered features: LLM evaluation harnesses, RAG retrieval testing, golden-dataset regression, agentic system testing. I build the frameworks and runners, and wire evaluation into CI/CD.
→ Full background, experience and competencies on mariusargatu.com/about
- AI / LLM testing: does the model answer faithfully, and can you prove it? faithfulness, answer relevancy, and hallucination scoring · RAG retrieval metrics (hit-rate, MRR, precision/recall@k) · agentic multi-turn and tool-call testing
- Test architecture: the frameworks and runners under the tests. model-based testing (xState) · property-based testing (Schemathesis, Hypothesis, fast-check) · Pydantic schemas
- The Harness Is the Product: Models Are a Commodity
- The System Under Test: A Broadband Support Agent
- Your Evals Are Checks, Not Tests
→ Read everything on mariusargatu.com/blog
- AI / LLM: DeepEval, RAGAS, Langfuse, Pydantic
- Languages: Python, TypeScript
- Test: Playwright, Pytest, Vitest
- API and contract: GraphQL, OpenAPI
- CI/CD and infra: GitHub Actions, Docker
“A test suite is a liability as much as an asset. Every test earns its place.”