RAG Evaluation
Retrieval-Augmented Generation (RAG) systems are a powerful way to integrate LLM’s into your stack, but they’re unreliable and complicated to test. You can build confidence in your RAG system by using state-of-the-art Patronus evaluators specifically designed to understand critical issues in retrieval systems.
Patronus RAG Evaluators
Here are various points of failure in a RAG system and the Patronus evaluators you can use to catch them:
![Patronus AI product model view](https://cdn.prod.website-files.com/64e655d42d3be60f582d0472/65fe10436695024f48bff097_model.png)
1. Retrieval Context Relevance: Evaluates whether the retrieved context is relevant to the model output.
2. Answer Relevance: Evaluates whether the model output is relevant to the model input.
3. Retrieval Hallucination: Evaluates whether the model output is grounded in the retrieved context.
4. Retrieval Context Sufficiency: Evaluates whether the retrieved context has sufficient information to produce the correct model output.
In-House RAG Eval Systems vs. Patronus API
Some customers try to develop evaluation systems in-house using GPT-4 as the evaluator, before realizing evaluation quality is subpar.
Patronus has the best evaluators on the market. We guarantee minimum 20% better evaluation performance regardless of what you’re doing today.
![Patronus AI RAG Evaluation graphic](https://cdn.prod.website-files.com/64e655d42d3be60f582d0472/65fe1013877771ed69743a7b_rag.png)
Some customers try to develop evaluation systems in-house using GPT-4 as the evaluator, before realizing evaluation quality is subpar. Patronus has the best evaluators on the market. We routinely improve evaluation quality by at least 20% for customers.