Weaviate: Leveraging Patronus AI's Percival to Accelerate Complex AI Agent Development

‍Introduction

Weaviate, the AI-native vector database, recently explored the capabilities of Patronus’ Percival, our first-of-a-kind AI companion to help AI teams debug and optimize their agentic workflows. Weaviate built a sophisticated agent system capable of orchestrating Weaviate's Query Agent and Transformation Agent. This endeavor presented a unique set of challenges, particularly in managing asynchronous operations and intricate data flows. Percival proved to significantly enhance the development and debugging process. As the developer noted, "It is an extremely helpful and interesting tool for building Agent systems. I think the best compliment that I can give as a developer is that this really made me feel empowered to test out a new idea."

The Challenge: Navigating the Complexities of Advanced Agent Systems

Developing robust AI agents, especially those designed to manage multiple specialized sub-agents and handle operations that don't complete instantly (asynchronous processes), is a frontier in AI engineering. Weaviate's team encountered several common, yet significant, hurdles:

The Labyrinth of Long Trajectories: AI agents often execute a lengthy series of actions and decisions before an error or unexpected behavior becomes apparent. Manually tracing back through these extensive "thought processes" to find the source of a problem can be time-consuming and inefficient.
Orchestrating Asynchronous Operations: Weaviate's goal was to create an intelligent system where one AI agent could direct others—one specialized in querying data (the Query Agent) and another in modifying or enriching data (the Transformation Agent). A key complexity was that these data transformations didn't happen instantaneously. The orchestrating agent needed to initiate a transformation, monitor its progress in the background, and then act on the results once ready. This level of coordination requires sophisticated reasoning from the AI.
The Nuances of Prompting and Tool Configuration: Even minor ambiguities in how an AI agent is instructed (its "prompt") or how its tools are defined can lead to significant deviations from the desired behavior. Identifying these subtle misconfigurations often turns into a frustrating cycle of trial and error.

Their technical objective was ambitious: an AI agent that could understand a complex, multi-stage user request. This involved first querying their Weaviate database to understand main topics in blog content, then initiating an asynchronous job to generate potential user questions based on this content, waiting for this job to complete, and finally, querying the newly enriched data to summarize topics of high interest. (The full Python script for this agent implementation can be found in the Appendix.)

Percival's Diagnostic Power: A Comprehensive Approach to Error Detection

To navigate such intricate development landscapes, AI teams need tools that can shed light on an agent's behavior and pinpoint a wide array of potential failure points. Percival is designed for precisely this, offering a systematic way to identify and categorize issues. Percival ingests traces from LLM-based workflows and is capable of detecting over 20 distinct failure modes. These are broadly categorized as:

Reasoning Errors: Flaws in the agent's thought process.
- Hallucinations: Inventing information, either purely text-based or by fabricating tool outputs or capabilities.
- Information Processing Issues: Including retrieving irrelevant information, using outdated memories, or misinterpreting the outputs of its tools.
- Decision Making Flaws: Misunderstanding the task at hand or choosing the wrong tool for the job.
- Output Generation Problems: Errors in formatting its response or failing to follow specific instructions.
System Execution Errors: Technical problems with the agent's operational environment.
- Configuration Issues: Incorrect tool definitions (e.g., a web search tool described as a calculator) or problems with environment setup like missing API keys.
- API Issues: Problems interacting with external services, such as rate limits, authentication failures, or server errors.
- Resource Management Problems: Including running out of memory or operations timing out.
Planning and Coordination Errors: Difficulties in managing the overall task or its context.
- Context Management Failures: Such as "forgetting" important earlier information or calling tools excessively due to memory limitations.
- Task Management Issues: Deviating from the main goal or failing to coordinate sub-tasks effectively.
Domain-Specific Errors: Mistakes tied to the unique aspects of the task and the user's specific instructions.

This comprehensive taxonomy allows Percival to provide targeted insights, helping teams understand not just that an agent failed, but why.

Key Results: Accelerated Development and Enhanced Debugging with Percival

By integrating Percival into their workflow, the Weaviate team experienced an uplift in their development process:

Reduced Debugging Time: The "Generate Insights" feature in Percival allowed the team to quickly understand complex agent traces that would have otherwise required hours of manual log analysis. The developer emphasized, "Overall, the ability to just click “Generate Insights” really unlocks the overhead of these long trajectory Agents."
Actionable, Automated Fixes: Percival didn't just flag problems; it offered concrete, actionable solutions. The "suggested prompt fix" capability, combined with an intuitive interface for applying these changes, enabled the team to correct issues quickly.
More Reliable and Robust Agents: By helping to catch and rectify subtle errors in tool usage, prompt design, and logical flow, Percival directly contributed to building a more dependable and accurate AI agent system.

Percival in Action: Uncovering and Resolving Critical Issues

The Weaviate team shared specific instances where Percival's insights proved invaluable:

Resolving Task Ambiguity: In one case, the Agent was struggling to understand the task it was being asked to perform. Percival identified the point of confusion in the agent's logic and suggested a clarification to the prompt. The developer was impressed: "I love this particularly for when you have say 8+ steps in the Function Calling loop.— And popping the suggested prompt fix with the cursor edit interface was a very cool UX that fixed it in one pass!"

Caption: Percival identifying an issue related to how the Agent interprets a task and uses the available tools to complete it.

Clarifying Tool Usage for the AI Model: Percival detected that the underlying AI model was having difficulty correctly understanding how to use a helper function for one of its specialized tools. These "model-tool misunderstandings" can be particularly challenging to debug manually.

‍

Caption: Percival detecting the AI model's confusion regarding tool use and suggesting clearer instructions.

Accelerating Innovation: The Impact on Development Speed

The most noticeable impact of using Percival was the boost in development velocity."This really made me feel empowered to test out a new idea. I would say it’s a similar feeling to Cursor, but for Agents!" said the developer.

By taking on much of the diagnostic heavy lifting, Percival freed up the Weaviate engineers to concentrate on innovation and the higher-level logic of their sophisticated agent system, rather than getting mired in the painstaking details of debugging.

Getting Started with Percival and Weaviate

Integrating Patronus Percival into your AI development workflow and connecting to Weaviate is straightforward. Here are some starter snippets:

Patronus Percival Starter:

‍

import patronus
from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from smolagents import LiteLLMModel, ToolCallingAgent, tool

# 🔧 Initialize Patronus tracing
patronus.init(integrations=[SmolagentsInstrumentor()])

# 🛠️ Define a simple tool
@tool
def get_weather_api(location: str, date_time: str) -> str:
    return f"Mock weather for {location} at {date_time}"

# 🤖 Create a simple multi-agent setup
def create_agent(model_id):
    weather_agent = ToolCallingAgent(
        tools=[get_weather_api],
        model=LiteLLMModel(model_id),
        name="weather_agent",
        description="Provides weather info"
    )
    manager = ToolCallingAgent(
        model=LiteLLMModel(model_id),
        managed_agents=[weather_agent],
        tools=[],
        add_base_tools=False
    )
    return manager

# 🚀 Run with tracing
@patronus.traced()
def main():
    agent = create_agent("openai/gpt-4o")
    agent.run("What's the weather in Paris tomorrow?")

main()

‍

For more details on getting started with Patronus Percival, see our Quick Start Guide.

Weaviate Starter:

‍

import os
import weaviate
from weaviate.classes.init import Auth
from weaviate.agents.query import QueryAgent

headers = {
    # Provide your required API key(s), e.g. Cohere, OpenAI, etc. for the configured vectorizer(s)
    "X-INFERENCE-PROVIDER-API-KEY": os.environ.get("YOUR_INFERENCE_PROVIDER_KEY", ""),
}

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ.get("WEAVIATE_URL"),
    auth_credentials=Auth.api_key(os.environ.get("WEAVIATE_API_KEY")),
    headers=headers,
)

# Instantiate a new agent object, and specify the collections to query
qa = QueryAgent(
    client=client, collections=["ecommerce", "financial_contracts", "weather"]
)

‍

To learn more about building with Weaviate Agents, check out the Weaviate Documentation.

Conclusion

Weaviate's experience with Patronus Percival highlights the transformative potential of AI-powered debugging and optimization for teams building the next generation of agentic AI systems. By offering deep insights into agent behavior, automating the detection of a wide range of common and complex errors, and providing actionable suggestions for improvement, Percival acts as an indispensable co-pilot for AI engineers. For organizations like Weaviate, venturing into complex AI applications involving multi-step reasoning and asynchronous processes, Percival offers a clear advantage: faster development cycles, more robust and reliable AI agents, and a greater capacity to innovate and tackle ambitious AI challenges. The ability to rapidly diagnose and fix issues in intricate agent interactions is no longer a bottleneck but a streamlined part of the development lifecycle.

Appendix: Weaviate Orchestration Agent - Full Python Script

‍

import os
import time
from datetime import datetime

from dotenv import load_dotenv
import weaviate

import patronus
from smolagents import LiteLLMModel, ToolCallingAgent, tool
from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor

from weaviate.agents.query import QueryAgent
from weaviate.agents.transformation import TransformationAgent
from weaviate.agents.classes import Operations
from weaviate.collections.classes.config import DataType

# --- Environment Setup ---
load_dotenv()

# --- Patronus Initialization ---
patronus.init(integrations=[SmolagentsInstrumentor(), ThreadingInstrumentor()])

# --- Helper: Connect to Weaviate ---
def connect_weaviate():
    return weaviate.connect_to_weaviate_cloud(
        cluster_url=os.getenv("WEAVIATE_URL"),
        auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WEAVIATE_API_KEY"))
    )

# --- Tool: Query Weaviate ---
@tool
def ask_weaviate_agent(query: str) -> str:
    """
    Returns answers to questions from knowledge stored in a database.

    Args:
        query: the query to ask the database.
    """
    client = connect_weaviate()
    qa = QueryAgent(client=client, collections=["Blogs"])
    response = qa.run(query)
    client.close()
    return str(response)

# --- Tool: Transform Data in Weaviate ---
@tool
def transform_weaviate_data(instruction: str, operation_type: str, property_name: str, view_properties: list) -> str:
    """
    Starts an asynchronous transformation job that updates or appends properties to the database collection.
    Returns a workflow_id that can be used to check the status of the transformation.

    Args:
        instruction: Instructions for the transformation to perform.
        operation_type: Type of operation to perform - "append" or "update".
        property_name: Name of the property to update or append to.
        view_properties: List of properties to view during transformation. If None, defaults to ["content"].
    """
    client = connect_weaviate()

    view_props = view_properties if view_properties else ["content"]
    if operation_type.lower() == "append":
        operation = Operations.append_property(
            property_name=property_name,
            data_type=DataType.TEXT_ARRAY,
            view_properties=view_props,
            instruction=instruction,
        )
    elif operation_type.lower() == "update":
        operation = Operations.update_property(
            property_name=property_name,
            view_properties=view_props,
            instruction=instruction,
        )
    else:
        client.close()
        return f"Invalid operation type: {operation_type}. Must be 'append' or 'update'."

    agent = TransformationAgent(
        client=client,
        collection="Blogs",
        operations=[operation],
    )
    response = agent.update_all()
    client.close()

    return f"Transformation started. Workflow ID: {response.workflow_id}. Use check_transformation_status tool to monitor progress."

# --- Tool: Check Transformation Status ---
@tool
def check_transformation_status(workflow_id: str) -> str:
    """
    Checks the status of an asynchronous transformation job.
    Please note the Transformation Agent typically takes a few minutes to run.

    Args:
        workflow_id: The ID of the workflow to check, obtained from transform_weaviate_data.
    """
    client = connect_weaviate()
    agent = TransformationAgent(client=client, collection="Blogs", operations=[])
    status = agent.get_status(workflow_id=workflow_id)
    client.close()
    return f"Transformation status: {status}"

# --- Tool: View Schemas ---
@tool
def check_weaviate_schemas() -> str:
    """
    Retrieves and returns all collection schemas from the Weaviate database.
    This provides information about all collections and their properties.

    Returns:
        A string representation of all collection schemas in the database.
    """
    client = connect_weaviate()
    response = client.collections.list_all(simple=True)
    client.close()
    return str(response)

# --- Tool: Wait ---
@tool
def wait_for_seconds(seconds: int) -> str:
    """
    Waits for the specified number of seconds before continuing.
    Useful when waiting for asynchronous operations to complete.

    Args:
        seconds: Number of seconds to wait.
    """
    time.sleep(seconds)
    return f"Waited for {seconds} seconds."

# --- Agent Setup ---
def create_agent(model_id):
    model = LiteLLMModel(model_id, temperature=0., top_p=1.)
    qa_agent = ToolCallingAgent(
        tools=[
            ask_weaviate_agent,
            transform_weaviate_data,
            check_transformation_status,
            check_weaviate_schemas,
            wait_for_seconds
        ],
        model=model,
        max_steps=20,
        name="weaviate_agent",
        description="""
        You are connected to a search engine that lets you search for information contained in the Weaviate Blogs.
        You can also transform data in the database by adding or updating properties.

        Note that transformations run asynchronously. When you call transform_weaviate_data, you'll receive a workflow_id.
        Use the check_transformation_status tool with this workflow_id to monitor progress.
        If a transformation is still running, you can use the wait_for_seconds tool to pause before checking again.
        Please note the Transformation Agent typically takes a few *minutes* to run.
        You can also check the database schemas using the check_weaviate_schemas tool.
        Before calling `transform_weaviate_data`, check if the target property already exists in the collection using `check_weaviate_schemas`. If the property exists, either skip the transformation or update the instruction to modify the existing property instead of creating a new one.
        """
    )
    return qa_agent

# --- Main Execution ---
@patronus.traced()
def main():
    agent = create_agent("openai/gpt-4o")

    complex_query = """
    First, find out what are the main topics covered in the blog content using the ask_weaviate_agent tool.
    Then, based on these topics, use the transform_weaviate_data tool to 'append' a new text array property named 'predicted_user_questions'
    to the 'Blogs' collection. The instruction should be to generate 5 potential user questions
    for each blog chunk based on its content and the identified main topics. For view_properties, use ['content'].
    After initiating the transformation, wait for 60 seconds using wait_for_seconds.
    Then, check the status of the transformation using check_transformation_status with the received workflow_id.
    If the transformation is not yet complete, wait for another 60 seconds and check again. Repeat this wait and check cycle once more if needed.
    Once the transformation is confirmed as complete, use the ask_weaviate_agent tool to query the database
    to find which predicted user questions (from the 'predicted_user_questions' property) are most
    frequently generated across all blog chunks and summarize the top 3 topics that users might be interested in.
    Before starting the transformation, use check_weaviate_schemas to ensure the 'predicted_user_questions' property does not already exist.
    If it exists, modify the plan to 'update' the existing property with new questions instead of appending.
    """

    result = agent.run(complex_query)
    print("\n--- Agent Final Result ---")
    print(result)

if __name__ == "__main__":
    main()

‍