Prompt Tester: Faster Iterations on Your Prompts

August 14, 2025

Background

A few weeks ago, we introduced Prompt Management. Today, we wanted to dive deeper into Prompt Tester. Small prompt changes have large implications. With Prompt Tester, we wanted to make those effects easier to visualize and iterate on.

Getting Started

Things you need to get started:

Prompt (a message you want to start with)
Context that will get injected (ex., template variable or additional user message)

Considerations for a Prompt

Prompts are meant to be broad and holistic enough to catch most edge cases. However, they should be specific enough to define a desired behavior.

Ex. You are a customer service chatbot for our restaurant. You can only answer questions that are grounded in our company documentation. You must respond in a kind, respectful, and friendly manner.

Considerations for Context

When it comes to context, it’s important to consider variations in intent or interpretation. For example, even if different people are looking for the same item of clothing, they might be looking for different sizes, colors, brands, etc.

When testing out variations of context, it’s helpful to keep in mind how different people may have different preferences, provide context differently, or share information that may be completely off-topic!

Some Types of Context:

Template variable - when there is almost a plug-and-play type of feel with the prompt.
Additional user message - when the user is responding with details (relevant or irrelevant)

Singular Prompt

Now that you have a base prompt and pieces of context to test with. One way to use Prompt Tester is to ensure the consistency of a single prompt across multiple context variations.

To do this:

Enter your prompt and leave placeholders (as indicated in the UI) for variables
Input your context variables as individual test items in the rows provided
Simulate to view the outputs of the test

Multiple Prompts

Another way to use Prompt Tester is to compare the performance of two prompt variations to figure out which one performs more optimally. This is particularly helpful for updates to system prompts that have cascading effects across use cases.

To do this:

Enter your prompt and leave placeholders (as indicated in the UI) for variables
Input your context variables as individual test items in the rows provided
Simulate to view the outputs of the test
Compare the prompt performances to see which one produced the desired output

Iteration & Evaluation

When it comes to iteration & evaluation of these prompts, the loops can be endless. It’s helpful to create some high-level guiding criteria for how your AI should respond. Putting the table stakes of accuracy and factuality aside, consider questions that make this AI application unique to you or your business. What personality should your AI have? How should it respond to questions it isn’t built to handle? How succinct or verbose should it be?

Conclusion

Whether you’re fine-tuning an existing prompt or comparing multiple prompts, you can build on your findings by updating, versioning, and labeling them with our platform and running it back.

If you have questions, feel free to reach out to us at contact@patronus.ai!

View file

PDF Document

Thank you! Your submission has been received, we'll be in touch soon!

Oops! Something went wrong while submitting the form. Please try again.

Prompt Tester: Faster Iterations on Your Prompts

Other Posts

Introducing Generative Simulators: Autonomously Scaling Environments for Agents

Introducing MEMTRACK: A Benchmark for Agent Memory

Percival Chat: An Eval Copilot for Agentic Systems