RAG Evaluation

Retrieval-Augmented Generation (RAG) systems are a powerful way to integrate LLM’s into your stack, but they’re unreliable and complicated to test. You can build confidence in your RAG system by using state-of-the-art Patronus evaluators specifically designed to understand critical issues in retrieval systems.

Patronus RAG Evaluators

Here are various points of failure in a RAG system and the Patronus evaluators you can use to catch them:

Patronus AI product model view

1. Retrieval Context Relevance: Evaluates whether the retrieved context is relevant to the model output.
2. Answer Relevance: Evaluates whether the model output is relevant to the model input.
3. Retrieval Hallucination: Evaluates whether the model output is grounded in the retrieved context.
4. Retrieval Context Sufficiency: Evaluates whether the retrieved context has sufficient information to produce the correct model output.

In-House RAG Eval Systems vs. Patronus API

Some customers try to develop evaluation systems in-house using GPT-4 as the evaluator, before realizing evaluation quality is subpar.

Patronus has the best evaluators on the market. We guarantee minimum 20% better evaluation performance regardless of what you’re doing today.

Patronus AI RAG Evaluation graphic

Some customers try to develop evaluation systems in-house using GPT-4 as the evaluator, before realizing evaluation quality is subpar. Patronus has the best evaluators on the market. We routinely improve evaluation quality by at least 20% for customers.