Evaluation for
Financial Services AI
Financial documentation is distributed, terminology-heavy, and nuanced. Analyzing, synthesizing, and rigorously testing your AI applications for appropriate use is hard.

Areas of Experience
We have built up experience in the financial services domain with the development of FinanceBench – the industry first benchmark for financial evaluation spanning the following LLM capabilities, created with the input of expert financial analysts.
Numerical Reasoning of Financial Metrics
Responses requiring numerical calculations, e.g. EBITDA, PE ratio, CAGR
Information Retrieval from Databases
Specific details are extracted directly from the documents
Logical Reasoning for Subjective Asks
Questions involving financial recommendations require interpretation and a degree of subjectivity
Knowledge of Accounting & Finance
Basic accounting and finance questions that analysts are familiar with

What is FinanceBench?
As a part of this release, customers can now evaluate their LLM system against FinanceBench on the Patronus AI platform. The platform can also detect hallucinations and other unexpected LLM behavior on financial questions in a scalable way.
What We Provide
Accuracy
See current, in-progress performance of your MLflow experiments
Relevance
Ensuring that the output is contextually relevant to the question.
Regulatory Alignment
Adhering to company and regulatory policies when producing an output.
Safety
Mitigating risks from prompt injections and data leakage.

Standard Product
Current platform offerings, such as evaluators, experiments, logs, and traces to get you up and running immediately
Custom Product
Collaborate on the creation of industry-grade guardrails (LLM-as-a Judge), benchmarks, or RL environments to evaluate with more granularity
