Stop Building Flaky Agents: Use Workflows & Memory

Deterministic patterns for non-deterministic models.

LLMs have this weird property: they’re brilliant at understanding nuance but terrible at following recipes. Give GPT-4 a vague problem and it’ll reason through possibilities. Give it a precise sequence of steps, and it might skip step 3 because step 5 “felt more relevant.”

This isn’t a bug in the model. It’s a fundamental characteristic of probabilistic systems trying to solve deterministic problems.

I’ve watched teams struggle with this mismatch. They build an agent to handle customer refunds, give it a dozen tools, and expect it to reliably execute a business process. Sometimes it works perfectly. Sometimes it hallucinates approvals that never happened. Sometimes it gets stuck asking for the same information three times.

The solution isn’t better prompts. It’s knowing when to stop asking the LLM to “think” and start telling it to “obey.”

When Deterministic Beats Creative

Think about what happens when you need to process a support ticket. Real-world business logic looks something like:

Fetch the ticket details from the database
Check if the user is eligible for a refund (policy rules)
Verify the transaction exists and hasn’t been refunded already
Calculate the refund amount
Process the payment reversal
Update the ticket status
Send confirmation email

You could hand this to an LLM as a tool-calling exercise. In my experience, that’s asking for trouble. The model might decide steps 2 and 3 are “basically the same thing” and skip one. Or it might process the refund before checking eligibility because the user seemed upset.

Workflows exist for exactly this scenario. They’re not exciting, but that’s the point.

Building a Weather Activity Planner

Here’s a practical example that shows the pattern. We need hard, factual weather data paired with creative activity suggestions. The weather fetch should never be creative, but the suggestions should be.

import { createWorkflow, createStep } from '@mastra/core/workflows';
import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

// Step 1: Fetch weather data (Deterministic)
const fetchWeather = createStep({
  id: 'fetch-weather',
  description: 'Fetches weather forecast for a given city',
  inputSchema: z.object({
    city: z.string(),
  }),
  outputSchema: z.object({
    location: z.string(),
    temperature: z.number(),
    conditions: z.string(),
    precipitationChance: z.number(),
  }),
  execute: async ({ inputData }) => {
    // ... (fetch logic) ...
    const weather = await fetch(`https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&current=temperature_2m,weather_code&daily=precipitation_probability_mean`).then(r => r.json());

    return {
      location: inputData.city,
      temperature: weather.current.temperature_2m,
      conditions: getWeatherCondition(weather.current.weather_code),
      precipitationChance: weather.daily.precipitation_probability_mean[0],
    };
  },
});

// Step 2: Agent suggests activities (Creative)
const activityPlanner = new Agent({
  id: 'activity-planner-agent',
  name: 'Activity Planner',
  instructions: `You are a local activities expert. Based on weather conditions, suggest 3-5 appropriate activities.
    - For rain (>50% precipitation), prioritize indoor activities
    - For extreme temperatures, consider climate-appropriate options
    - Always include one adventurous and one relaxing option`,
  model: openai('gpt-5'),
});

const planActivities = createStep({
  id: 'plan-activities',
  description: 'Uses AI to suggest activities based on weather',
  inputSchema: z.object({
    location: z.string(),
    temperature: z.number(),
    conditions: z.string(),
    precipitationChance: z.number(),
  }),
  outputSchema: z.object({
    activities: z.string(),
  }),
  execute: async ({ inputData }) => {
    const prompt = `Weather in ${inputData.location}: ${inputData.temperature}°C...`;
    const response = await activityPlanner.generate(prompt);
    return { activities: response.text };
  },
});

// The Pipeline
export const activityPlannerWorkflow = createWorkflow({
  id: 'activity-planner',
  inputSchema: z.object({ city: z.string() }),
  outputSchema: z.object({ activities: z.string() }),
})
  .then(fetchWeather)
  .then(planActivities);

activityPlannerWorkflow.commit();

The LLM never touches the weather API. It gets ground-truth data as input, then does what it’s actually good at: making contextual suggestions. If you flip this around and let the agent fetch weather data, you’ll eventually get a sunny forecast when it’s actually raining.

When to consider workflows:

You have a known sequence of steps that must happen in order
You need observability at each stage (logs, metrics, timing)
You need retry logic for flaky external APIs
Business rules can’t be “interpreted” - they must be followed exactly

The Context Window Problem Nobody Talks About

There’s this pattern I keep seeing. Someone builds a chatbot. It works great during testing. Then in production, users have longer conversations and suddenly the bot gets lost.

The developer looks at the logs and realizes they’re sending the entire conversation history with every request. All 47 messages. They’re burning through tokens and context space for information that’s mostly irrelevant.

Worse, there’s a phenomenon researchers call “lost in the middle” where models perform worse when relevant information is buried in a long context. The model literally can’t see the forest for the trees.

Sending full conversation history feels safe. You’re giving the model “all the information.” But you’re actually making it harder for the model to focus on what matters.

Working Memory vs. Long-Term Storage

Mastra’s memory system gives you both. Working memory keeps recent messages in the context window. Semantic recall searches historical messages when the current query seems related.

import { Agent } from '@mastra/core/agent';
import { Memory } from '@mastra/memory';
import { LibSQLStore } from '@mastra/libsql';

export const memoryAgent = new Agent({
  id: 'memory-agent',
  name: 'Memory Agent',
  instructions: 'You are a helpful assistant with perfect recall of our conversations.',
  model: openai('gpt-5'),
  memory: new Memory({
    storage: new LibSQLStore({
      id: 'memory-agent-store',
      url: 'file:../mastra.db',
    }),
    options: {
      lastMessages: 20,  // Keep last 20 messages in context
      semanticRecall: {
        enabled: true,  // Use embeddings to find old stuff
        topK: 5,
        threshold: 0.7,
      },
    },
  }),
});

Here’s how this plays out in practice. A user asks: “What was that Italian restaurant you recommended last month?”

Without semantic recall, the agent sees the last 20 messages. The restaurant recommendation was message 487 of 506. It’s gone. The agent says “I don’t have that information.”

With semantic recall:

The query gets embedded: [0.234, -0.567, 0.891, ...]
The embedding gets compared against historical messages
Message 487 (“I’d recommend Trattoria Bella - their carbonara is incredible”) scores 0.89 similarity
That message gets injected into the current context
The agent responds: “I recommended Trattoria Bella. Their carbonara is what caught my attention.”

The agent appears to have perfect memory while only using a fraction of the context window. This isn’t just clever engineering - it’s functionally necessary once conversations extend beyond a few dozen messages.

Coordination Through Agent Networks

Sometimes you need both structure and flexibility. Pure workflows are too rigid. Pure agents are too unpredictable.

Agent networks give you a coordinator that decides which specialized agent or workflow to invoke based on the task. Think of it as a smart load balancer for AI capabilities.

export const coordinatorAgent = new Agent({
  id: 'coordinator-agent',
  name: 'Research Coordinator',
  instructions: `You are a network of researchers and writers.
    - Use researchAgent for gathering facts
    - Use writingAgent for producing final content
    - Use weatherTool for current weather data
    - Use activityPlannerWorkflow for location-based planning

    Always produce comprehensive, well-structured responses.`,
  model: openai('gpt-5'),

  // Available primitives
  agents: { researchAgent, writingAgent },
  workflows: { activityPlannerWorkflow },
  tools: { weatherTool },

  // Network requires memory
  memory: new Memory({
    storage: new LibSQLStore({ id: 'network-store', url: 'file:../network.db' }),
  }),
});

When you query this network, the coordinator analyzes the request and routes accordingly:

“I need facts about X” triggers the research agent
“Plan a weekend in Seattle” runs the activity planner workflow
“Write a report on Y” engages the writing agent

This pattern scales better than trying to cram everything into a single mega-agent. Specialized agents develop focused expertise. The coordinator handles routing. Each piece does what it’s good at.

Putting It Together

Real production AI systems need architecture, not just prompts. You’re building distributed systems where some nodes happen to be LLMs.

Workflows give you guarantees when you need things to happen exactly right. Memory gives you context without burning through your token budget. Agent networks let you compose complexity from simpler parts.

None of this is glamorous. But after watching enough “fully autonomous agents” fail in production, I’ve come to appreciate boring reliability over exciting unpredictability.

Your mileage may vary, but in my experience, the systems that actually ship and stay running are the ones that treat LLMs as components in a larger architecture rather than magic boxes that solve everything.