Product Features

From novel test suite generation to real-time LLM evaluation, the Patronus suite of features provide end-to-end solutions, so you can confidently deploy LLM applications at scale.

1

Evaluation Runs

Leverage our managed service to score model performance based on our proprietary taxonomy of criteria.

Patronus AI product chart
2

LLM Failure Monitoring & Observability

“Sentry for LLM Failures”:  Continuously evaluate and track LLM performance for your AI product in production using the Patronus Evaluate API

Patronus AI product chart
3

Patronus Datasets

Use our off-the-shelf, adversarial testing sets designed to break models on specific use cases

Patronus AI product chart
FinanceBench logo

Developed with 15 financial industry domain experts, FinanceBench is a high quality, large-scale set of 10,000 question and answer pairs based on publicly available financial documents like SEC 10Ks, SEC 10Qs, SEC 8Ks, earnings reports, and earnings call transcripts.

Data Bricks logoBen bites logo
Simple Safety Tests logo

Developed with AI researchers at Oxford University and MilaNLP Lab at Bocconi University, SimpleSafetyTests is a diagnostic test suite to identify critical safety risks in LLMs across 5 areas: suicide, child abuse, physical harm, illegal items, and scams & fraud.

VentureBeat logoComputerWorld logo
EnterprisePll logo

Developed with MosaicML, EnterprisePII is the industry’s first LLM dataset for detecting business-sensitive information. The dataset contains 3,000 examples of annotated text excerpts from common enterprise text types such as meeting notes, commercial contracts, marketing emails, performance reviews, and more.

CNBC logo colorFortune logo
4

Test Suite Generation

Auto-generate novel adversarial testing sets at scale to find all the edge cases where your models fail.

Patronus AI product chart
5

Benchmarking

Compare models side by side to understand how they differ in performance in real world scenarios.

Patronus AI product chart
6

Retrieval-Augmented Generation (RAG) Testing

Verify that your LLM-based retrieval systems consistently deliver reliable information with our cutting-edge RAG and retrieval testing workflows.

Patronus AI product chart