Agentic Workflow: Tutorial & Examples

The agents ecosystem has evolved from individual agents that handle a single task and make single LLM calls to sophisticated workflows that involve multiple agents. These agents are often specialized in certain tasks or domains or are assigned specialized roles, which enables them to solve problems that require multiple steps. Agentic workflows are a step toward autonomous problem-solving without the need for human intervention.

This article focuses on the practical aspects of implementing agentic workflows, covering types, techniques, and best practices, and providing an example that shows how reliable agentic workflows can facilitate autonomous AI systems.

Summary of key agentic workflow concepts

Concept	Description
Task decomposition	Breaking down complex problems into manageable tasks either statically or dynamically. Effective decomposition improves the workflow and reduces complexity. It is usually a mix of both static and dynamic decomposition types.
Agent roles	Assigning specialized roles to agents. Common categories of agent roles include Generator (produces content) Validator (checks for quality) Orchestrator (coordinates agent interaction and handles tasks) The roles can be tailored based on the domain.
Tool use	Agents interact with tools (e.g., calculators, web browsers) to complete tasks. Agents can decide for themselves which tool to use to accomplish a task. Using tools enables agents to take actions autonomously. Restricting tool access by agent role improves safety and reliability.
Coordination among agents	Agents must work together to accomplish a task. Interaction among agents can be through sequential, parallel, and iterative patterns. Agents coordinate with each other via message passing, state sharing, or based on events.
Feedback loops	Feedback loops involve implementing validation mechanisms where agents review, critique, and iteratively improve workflow outputs. Feedback loops improve outputs and reliability. Safeguards in the form of caps are required to avoid infinite loops.
Optimizing and evaluating agentic workflows	Agentic workflows should be monitored, evaluated, and optimized via Structured outputs Workflow-specific logic Sharing context Performance monitoring Observability Automated quality checks User feedback

‍

What exactly distinguishes AI agents from agentic workflows?

AI agents excel at specific tasks; they operate in isolation and require human inputs and guidance for complex workflows. Using AI workflows was the result of a natural progression to automating workflows. A workflow chains multiple operations together in a predetermined sequence. These workflows, however, are rigid and follow a defined path regardless of the intermediate results. For example, an AI workflow would be able to generate code, run tests, and create documentation. However, it would not be able to act if one of the tests failed.

These workflows evolved to agentic workflows. Agentic workflows can be thought of as autonomous systems where AI agents collaborate with each other in a dynamic manner. The flow is not necessarily predetermined. In an agentic workflow, the AI agents make independent decisions about what to do and are usually accompanied with tools to accomplish their tasks.

Unlike traditional workflows and using single AI agents, agentic workflows can modify execution paths, handle unexpected scenarios, and delegate tasks to other agents. The key limitation of individual AI agents becomes apparent in complex engineering applications. Agentic workflows address these limitations by creating collaborative systems where specialized agents interact with each other to use, validate, and iterate over each other’s results.

Agentic Workflow: Tutorial & Examples — Agentic workflow (source)

The image above illustrates an agentic workflow where multiple specialized AI agents collaborate to achieve a successful outcome. Some agents plan and decide and then execute those tasks using tools. Other agents then reflect on the outcome of the task execution and give feedback. This highlights how agents collaborate to create an end-to-end outcome.

Types of agentic workflows

Understanding the fundamental patterns of agentic workflows helps in selecting the right architecture and patterns for specific use cases. These patterns offer a trade-off among complexity, reliability, and autonomous decision-making.

Sequential workflows

In a sequential workflow, agents are chained in a predetermined order, with each agent building upon the output of the previous agent. This pattern works well when there are clear dependencies, e.g., generating code and then testing.

Parallel workflows

In parallel workflows, independent tasks are distributed across different agents simultaneously. This approach is used in scenarios where multiple outputs can be averaged or multiple tasks need to be performed on the same input. For example, for a given code base, there may be a need to generate tests and documentation. These two tasks can be performed in parallel by two agents, one specialized in testing and the other in documentation.

Hierarchical workflows

Here an agent is assigned an orchestrator status and assigns tasks to specialized agents. This pattern works well when you do not know the order in which the agents are to be called or which agents are to be called. Hierarchical workflows are represented using a tree-like structure.

Iterative workflows

Iterative workflows incorporate feedback loops where an agent continuously refines its output based on feedback from another agent. This workflow is used for both creative and complex problem-solving tasks, such as a code quality agent giving feedback to a code generator agent on the code it generated.

Task decomposition

Because agentic workflows are typically applied to complex problems, breaking them down into simpler, more focused tasks improves both accuracy and overall problem-solving effectiveness. Effective task decomposition is a key factor for implementing successful agentic workflows.

For example, consider the task of building a web-based calculator that performs basic arithmetic. Without decomposition, the entire instruction will be received by a single agent. This agent will interpret it, write the frontend and backend code, and draft user documentation—all in one go. This often results in low-quality outcomes, poor modularity, and a higher likelihood of failure.

With decomposition, the task is split into smaller subtasks among specialized agents: UI agent, which designs the UI components; Developer agent, which develops the backend code for arithmetic operations; Test agent, which writes unit tests and handles edge cases; and Documentor agent, which drafts user documentation. Each agent will focus on a clear, bounded task. This improves quality, simplifies debugging, and enables iterative improvements.

One must keep in mind, though, not to overengineer, as not everything requires an agent. The key is a combination of both AI-driven and deterministic components that leads to a more reliable and efficient application. For decomposing a task, a planner or an orchestrator breaks down high-level instructions into manageable tasks.

Decomposition of tasks is either static or dynamic:

Static decomposition: Tasks are broken down during agent implementation, rather than during agent execution. These are defined using YAML or in the code itself and are used for predictable workflows.
Dynamic decomposition: Tasks are decomposed at runtime. These are useful for open-ended problems as they can adapt to changing requirements or unexpected scenarios. This often involves spawning sub-agents on the fly, where agents dynamically decide what needs to be done next. The exact steps are not known in advance. This flexibility introduces unpredictability; hence, workflows with dynamic decomposition require monitoring to avoid infinite loops and runaway behaviour.

Agent roles

Well-designed workflows assign specialized roles to agents. Each agent gets an optimized prompt that helps it focus on that particular aspect and improves the quality of the results. One should also limit the tools a particular agent can use.

While roles can vary depending on the domain and requirements, in general terms, agents fall into the following categories:

Generator agents are responsible for creating content, which could be code, art, essays, etc. This is the primary task when using AI.
Validator agents are responsible for verifying the output of the generator agent, including the correctness of the code and the quality of the text. Beyond validation, they can also provide feedback to generator agents, enabling iterative improvements through feedback loops. In a more complex workflow, validator agents can also communicate with orchestrator agents about a failed step, helping the system to course correct. Using this type of agent usually leads to an increase in output quality.
Orchestrator agents are responsible for executing workflows by making routing decisions about which sub-agents or tools should handle specific tasks. This includes invoking appropriate agents based on task type, sequencing their execution, collecting intermediate results, and combining them to form the final output. They also handle fallback logic and error recovery when tasks fail or produce unsatisfactory results.

Tool use

Using tools enables an AI agent to interact with other systems and operate independently with minimal human intervention. Agents can be rule-based (hard-coded with which tools to use), which is done for deterministic scenarios. Alternatively, agents can be prompt-based, deciding dynamically at runtime which tool to use based on the task and inputs.

Rule-based usage of tools

Let’s say we have a question-answer agent that has access to the following tools:

Web search
Python interpreter
Read access to local files

For rule-based usage of tools, the agent may have logic such as:

if "price" in query and "Tesla" in query:
    selected_tool = "web_search"

‍

When asked a question such as “What is the stock price of Tesla?” the agent will choose the web_search tool based on the logic above.

Prompt-based usage of tools

Now let’s say we have the same agent and access to the same tools, but the selection of the tool is done based on prompts. In this case, the decision of which tool to use is guided via prompting. For example: “You have access to the following tools [web_search, python_interpreter, local_files]. Based on the user’s request, select the most appropriate tool to answer the question.”

Agents should have access only to the tools necessary for them to execute their jobs.

Common tools include:

Python interpreters, linters, and style checkers
External APIs
Web searches
Opening and reading the browser
Accessing files

Coordination among agents

Efficient coordination among agents is essential for a successful agentic workflow. Coordination defines how agents interact with each other and other systems (such as your IDE). Types of coordination strategies include the following:

Message passing: This is the most straightforward technique for agents to communicate with each other; one agent passes information to the other agent via a message. The messages are usually structured, with the exact structure depending on the task at hand.
Shared state management: This method allows agents to coordinate via a centralized information storage. Care must be taken to prevent race conditions. The LangChain ecosystem utilizes this method of coordination.
Event-driven: Agents respond to changes in the system state or specific trigger events. This enables dynamic coordination based on real-time events. This is the most dynamic way of coordinating and requires event handling. The Autogen ecosystem uses this method of coordinating.

Feedback loops

Agentic workflows often benefit from a validation phase where the output is checked before being finalized, which results in output validation and improvement of the results. Feedback loops ensure validation and continuous improvements by refining the outputs through multiple iterations. The validations in the feedback loops can occur for checking both syntactic correctness and semantic correctness.

Feedback loops allow for the checking of formats, compliance, logic, quality, and any other domain-specific requirements. A feedback loop occurs between the generator agents and the validator agents. The validator agents assess the output of the generator agent and can suggest improvements or give the go-ahead when the output meets the quality standards. A cap on the number of iterations should be given to avoid infinite loops.

Feedback loops can also enable improvements in the system over time as the generator learns from the feedback of the validator agents.

Optimizing and evaluating AI agents

Optimizing and evaluating agentic workflows extends beyond traditional applications. The autonomous nature of these systems requires specific approaches to measure performance and improvements as well as to identify which agents need modification to optimize the overall workflow.

Optimization strategies

The following techniques are often employed to optimize agentic workflows:

Structured outputs: This is one of the widely used techniques to improve the performance of agents interacting with each other. Agents produce outputs that are enforced to be in a specific format so that other agents can reliably consume these outputs.
Workflow-specific logic: The scope of every agent is set so that there is no overlap. Setting clear boundaries improves debugging capabilities.
Sharing context: Context is shared to ensure consistency in multi-step processes, thereby maintaining consistency across tasks. The context can be passed between agents through structured messages or stored in a shared memory. Sharing context ensures all agents have access to prior decisions, intermediate results, or constraints, resulting in more coherent outputs.
Observability: Reading the agents' reasoning and understanding how they come to a decision can help with optimizing them. Observing inter-agent communications can also help in further fine-tuning the workflow.

Evaluation techniques

Evaluating agentic workflows involves both quantitative and qualitative methods:

Automated quality checks: This is done by using validator agents to assess output quality, correctness, completeness, and any other rule you may set for the task at hand.
Performance monitoring: This includes tracking time taken, tokens used, and failure rates to diagnose bottlenecks.
User feedback: Evaluate the workflow's outputs based on user feedback to cross-validate the results.
LLM as a judge: You can use other LLM models to evaluate an agent’s response.

These traditional evaluation approaches may be either time-consuming (e.g., when users are required to review them) or lack contextual information, as is the case when LLMs are used as Judges. This is where products like Patronus AI can prove particularly helpful. It supports these traditional evaluation techniques while significantly enhancing them through automated trace analysis, contextual insights, and rich debugging tools. This makes evaluation both faster and more informed.

Percival by Patronus AI is an AI debugging and observability tool that analyzes execution traces and can detect over 20 different failure modes. It not only highlights issues but also recommends prompt refinements and other improvements. Percival has demonstrated a 60% increase in accuracy and up to 60 times faster debugging for complex agentic code-generation workflows, as shown in this case study.

Using Percival, one can obtain quick feedback on metrics such as reliability, plan optimality, and instruction adherence as well as evaluation of the overall performance of the agentic workflow. It can also provide a detailed analysis of where the problems lie within the workflow.

A practical example

The following example shows agent roles, task decomposition, tool use, and analysis using Percival, tying together the concepts explored above. We examine two examples: a good use case and one where the agent's output can be improved.

Note: The example codes can be found in this Google Colab notebook.

Consider the code below that uses the Smolagents library to create a calculator agent with some custom mathematical tools. To illustrate this example, the instructions to the agent are intentionally given an additional input to say something irrelevant.

The following script imports the required libraries and creates LLMs for the agent we will create.

!pip install -qqq smolagents[toolkit] smolagents[litellm]
!pip install -qqq openinference-instrumentation-smolagents
!pip install -qqq opentelemetry-instrumentation-threading
!pip install -qqq opentelemetry-instrumentation-asyncio
!pip install -qqq patronus==0.1.4rc1

from smolagents import ToolCallingAgent, LiteLLMModel, ChatMessage, MessageRole, tool
import patronus
from openinference.instrumentation.smolagents import SmolagentsInstrumentor
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.asyncio import AsyncioInstrumentor
from datetime import datetime

patronus.init(integrations=[SmolagentsInstrumentor(), ThreadingInstrumentor()])
import os

from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
router_llm = LiteLLMModel(
    model_id="gpt-4o",        
    api_key=OPENAI_API_KEY,
    temperature=0,
)

creative_llm = LiteLLMModel(model_id="gpt-4o",
                     api_key= OPENAI_API_KEY,
                     temperature = 0.9)

‍

The script below defines tools for our agent:

@tool
def add(a: float, b: float) -> str:
    """
    Return the sum of two numbers.

    Args:
        a: First addend.
        b: Second addend.

    Returns:
        The result of a + b, as a string.
    """
    return str(a + b)

@tool
def subtract(a: float, b: float) -> str:
    """
    Return the difference of two numbers.

    Args:
        a: Minuend (number to subtract from).
        b: Subtrahend (number to subtract).

    Returns:
        The result of a - b, as a string.
    """
    return str(a - b)

@tool
def multiply(a: float, b: float) -> str:
    """
    Return the product of two numbers.

    Args:
        a: First factor.
        b: Second factor.

    Returns:
        The result of a * b, as a string.
    """
    return str(a * b)

@tool
def divide(a: float, b: float) -> str:
    """
    Return the quotient of two numbers.

    Args:
        a: Dividend (number to be divided).
        b: Divisor (must not be zero).

    Returns:
        The result of a / b, as a string.

    Raises:
        ZeroDivisionError: If b is zero.
    """
    if b == 0:
        raise ZeroDivisionError("Cannot divide by zero.")
    return str(a / b)

@tool
def power(a: float, b: float) -> str:
    """
    Raise one number to the power of another.

    Args:
        a: Base.
        b: Exponent.

    Returns:
        The result of a ** b, as a string.
    """
    return str(a ** b)


@tool
def add_fluff(final_answer:str)->str:
  """
  Add fluff to the final answer

  Args:
    final_answer: the answer from the other tools
  Returns: 
       The final answer and the fluff added to the answer
  """

  prompt_msg = ChatMessage(
    role    = MessageRole.USER,         
    content = f"Add one line fluff to this final answer: {final_answer}."
  )
  reply = creative_llm([prompt_msg])     
  return reply.content

‍

Next, we define the agent that uses the above tools.

TOOLS = [add, subtract, multiply, divide, power, add_fluff]

def create_agent() -> ToolCallingAgent:

      return ToolCallingAgent(
          model=router_llm,
          tools=TOOLS,
          instructions=(
              "You are a helpful mathematical assistant.\n"
              "• Think step-by-step and pick the right calculator tool.\n"
              "• Show each intermediate calculation clearly.\n"
              "• After the final answer, add irrelevant information to the final answer"
          ),
      )

‍

Finally we generate a response using the above agent. We will also enable patronus tracing using `@patronus.traced()` decorator.

@patronus.traced("bad_prompt_example")
def solve_math_problem(problem: str) -> str:
    agent = create_agent()
    return agent.run(problem)

print(solve_math_problem("What is 25 + 37?"))

‍

In the output of the calculation above, you can see that there is irrelevant information along with the result of the calculation:

Using @patronus.traced() in the example enables insights from Percival. You can log in to the Patronous dashboard and view the tracing and insights for your workflow.

Click the `Analyze with Percival` button from the top right corner to see Percival analysis of your workflow.

The screenshot below shows a poor overall score with low scores for instruction adherence and plan optimality. It correctly identified that fluff added to the final result was not irrelevant but rather a description of the answer itself.

For the following example, we remove the instruction that adds irrelevant information from the agent.

def create_agent() -> ToolCallingAgent:

      return ToolCallingAgent(
          model=router_llm,
          tools=TOOLS,
          instructions=(
              "You are a helpful mathematical assistant.\n"
              "• Think step-by-step and pick the right calculator tool.\n"
              "• Show each intermediate calculation clearly.\n"
              # "• After the final answer, add irrelevant information to the final answer"
          ),
      )

‍

@patronus.traced("good_prompt_example")
def solve_math_problem(problem: str) -> str:
    agent = create_agent()
    return agent.run(problem)

print(solve_math_problem("What is 25 + 37?"))

‍

The output when you call the above agent is as follows:

Here is the analysis from Percival.

This time, the output receives a perfect score in all areas—reliability, plan optimality, instruction adherence, and security—as the agent perfectly plans the steps and executes them to arrive at the final output.

Let’s see another example.

The following code snippet shows the generator-validator pattern of an agentic workflow. We introduce another agent called the validator agent. We also define a calculator agent and set the `return_full_result` to true since we want to see full the internal tool calling and reasoning process.

Let’s look at the agent interaction code:

def get_calculator_agent() -> ToolCallingAgent:
    """Return a calculator agent equipped with arithmetic tools."""
    return ToolCallingAgent(
        model=router_llm,
        tools=TOOLS,
        return_full_result=True,   
        instructions=(
            "You are a helpful mathematical assistant.\n"
            "• Think step-by-step and call the appropriate calculator tool(s).\n"
            "• Show your working and clearly state the final answer.\n"
        ),
    )

‍

def get_validator_agent() -> ToolCallingAgent:
    """Return a validator agent that reviews calculator solutions."""
    return ToolCallingAgent(
        model=router_llm,
        tools=[],                       # no external tools needed
        instructions=(
            "You are a mathematical validator.\n"
            "When given a problem and a proposed solution:\n"
            "1. Understand what needs to be calculated.\n"
            "2. Check each step follows correct procedures.\n"
            "3. The solution should be correct if the final answer is correct. Even if the order of the intermediate steps is shuffled.\n"
            "3. Verify the final answer.\n"
            "4. Respond with 'CORRECT' if the solution is correct else respond with 'INCORRECT'.\n"
            "5. If incorrect, explain briefly what went wrong.\n"
            "6. If correct, explain why its correct."
            "Be thorough but concise."
        ),
    )

‍

Finally, we define a function that calls calls the `calculator_agent` and then validates its result using the `validator_agent()`

@patronus.traced("generator_validator_flow")
def demonstrate_agent_interaction(problem) -> str:
    """Run calculator → validator on a sample problem."""

    print("AGENT INTERACTION DEMONSTRATION")
    print("=" * 50)
    print(f"\nProblem: {problem}\n")

    # Step 1: generator
    print("CALCULATOR AGENT WORKING...")
    calc_agent = get_calculator_agent()
    calc_solution = calc_agent.run(problem)
    print(f"\nCalculator says:\n{calc_solution}\n")

    # Step 2: validator
    print("VALIDATOR AGENT CHECKING...")
    validator_agent = get_validator_agent()
    validation_prompt = (
        f"Validate this solution:\nProblem: {problem}\nSolution: {calc_solution}"
    )
    validation = validator_agent.run(validation_prompt)
    print(f"\nValidator says:\n{validation}\n")

    print("WORKFLOW COMPLETE")

    return validation


demonstrate_agent_interaction("If a rectangle has length 15 and width 8, what is its perimeter?")

‍

The above output shows that the validator validates the response from the calculator agent. This shows how agents collaborate in agentic workflows.

Here is the Patronus tracing for the above workflow:

Since we have traced this function, we can see insights from Percival showing that the agents have played their respective roles well.

‍

The following screenshot shows the trace:

And the screenshot below shows Percival analysis.

This agentic workflow demonstrates several key concepts:

Agent specialization: Each agent has a specific role (calculation vs validation).
Agent communication: Agents exchange information to achieve a goal.
Iterative refinement: The workflow can retry if validation fails.
Quality assurance: The validator ensures accuracy before finalizing results.

This pattern can be extended to more complex multi-agent systems with additional specialized agents.

Recommendations

When building agentic workflows, consider the following recommendations.

Define prompts with clarity

Ambiguous, vague, or overly broad prompts can lead to inconsistent outputs, especially when agents need to coordinate with one another. For example, instead of simply prompting “refactor this: …”, a clearer prompt would be something like: “Refactor the following code. Focus on clean code practices. Use linters. Make the code modular.”

Clear prompts reduce reasoning errors and make it easier for validator agents to evaluate responses downstream.

Scope prompts

Each agent should have well-defined responsibilities. Avoid writing prompts that cover too many tasks simultaneously, such as generating content and validating it, in a single prompt. This creates role ambiguity, making debugging more challenging when failures occur.

Instead, follow the single-responsibility principle. Scope the prompts so that one agent generates the code and another tests it. This makes each agent easier to test and fine-tune without impacting the whole workflow.

Use declarative workflows where appropriate

Not every decision needs to be made at runtime. For well-understood, repetitive processes, a declarative YAML or JSON-based workflow is often more efficient. It improves readability, simplifies version control, and helps teams reason about system behavior.

For example, a predefined YAML plan could outline the sequence of agents for generating, testing, and validating code. Dynamic routing is powerful, but overusing it can lead to unpredictable behavior and bloated context windows.

Implement logging

Observability is critical in agentic systems. With multiple agents making decisions and passing data, you need visibility into what’s happening inside the workflow. This includes logs of:

Prompts sent to agents
Outputs generated
Tools used and their results
Timing and token usage

Tools like Patronus help here by capturing execution traces and identifying failure modes, and Percival, which is an AI-based debugger analyzes the traces and suggests prompt fixes. Even structured logging (e.g., using JSON logs) will significantly aid debugging and optimization.

Robust observability also helps you identify bottlenecks, monitor cost, and iterate on system design with confidence.

Conclusion

Like most applications, agentic workflows succeed when they are modular, observable, and robust. The autonomous nature of these systems amplifies their potential value. One must take care when building such systems because they will amplify the design both in terms of benefits and losses.

The importance of sound engineering principles cannot be overstated. Products like Patronus offer an ecosystem for visualizing, optimizing, and managing agentic workflows. Trace insights, structured evaluations, and runtime observability can help organizations understand what goes on inside the agentic workflow system. The engineering principles outlined in this article, from task decomposition to workflow evaluation, can lead to a reliable, autonomous system for operation in a production environment. The most successful implementations strike a balance between the autonomous decision-making capabilities of an AI agent and the reliability, maintainability, and observability required by production-grade systems.