停止构建不可靠的代理：使用工作流与记忆

非确定性模型的确定性模式

LLMs 有一种奇怪的特性：它们在理解细微差别方面表现出色，却在遵循配方时糟透了。给 GPT‑4 一个模糊的问题，它会推演各种可能性。给它一串精确的步骤，它可能会跳过第 3 步，因为第 5 步“看起来更相关”。

这不是模型的 bug，而是概率系统试图解决确定性问题的根本特性。

我见过团队在这类不匹配中苦苦挣扎。他们构建一个处理客户退款的代理，给它十几种工具，期望它可靠地执行业务流程。有时它完美运行，有时它会幻觉出根本不存在的批准，有时它会卡住，三次请求相同的信息。

解决方案不是更好的提示，而是知道何时停止让 LLM “思考”，开始让它 “服从”。

当确定性胜过创造性

想想处理支持工单时会发生什么。真实的业务逻辑大致如下：

从数据库获取工单详情
检查用户是否符合退款资格（政策规则）
验证交易是否存在且未被退款
计算退款金额
处理付款逆转
更新工单状态
发送确认邮件

你可以把这些交给 LLM 当作工具调用练习。以我的经验，这等于自找麻烦。模型可能会认为第 2 步和第 3 步“基本是同一件事”，于是跳过其中一步。或者它会在检查资格之前就处理退款，因为用户看起来很不满。

工作流正是为这种场景而生。它们并不激动人心，但这正是重点。

构建天气活动规划器

下面是一个展示该模式的实用示例。我们需要硬核、事实性的天气数据配合创意的活动建议。天气获取绝不能带有创意，而建议部分则应如此。

import { createWorkflow, createStep } from '@mastra/core/workflows';
import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

// Step 1: Fetch weather data (Deterministic)
const fetchWeather = createStep({
  id: 'fetch-weather',
  description: 'Fetches weather forecast for a given city',
  inputSchema: z.object({
    city: z.string(),
  }),
  outputSchema: z.object({
    location: z.string(),
    temperature: z.number(),
    conditions: z.string(),
    precipitationChance: z.number(),
  }),
  execute: async ({ inputData }) => {
    // ... (fetch logic) ...
    const weather = await fetch(`https://api.open-meteo.com/v1/forecast?latitude=52.52&longitude=13.41&current=temperature_2m,weather_code&daily=precipitation_probability_mean`).then(r => r.json());

    return {
      location: inputData.city,
      temperature: weather.current.temperature_2m,
      conditions: getWeatherCondition(weather.current.weather_code),
      precipitationChance: weather.daily.precipitation_probability_mean[0],
    };
  },
});

// Step 2: Agent suggests activities (Creative)
const activityPlanner = new Agent({
  id: 'activity-planner-agent',
  name: 'Activity Planner',
  instructions: `You are a local activities expert. Based on weather conditions, suggest 3-5 appropriate activities.
    - For rain (>50% precipitation), prioritize indoor activities
    - For extreme temperatures, consider climate-appropriate options
    - Always include one adventurous and one relaxing option`,
  model: openai('gpt-5'),
});

const planActivities = createStep({
  id: 'plan-activities',
  description: 'Uses AI to suggest activities based on weather',
  inputSchema: z.object({
    location: z.string(),
    temperature: z.number(),
    conditions: z.string(),
    precipitationChance: z.number(),
  }),
  outputSchema: z.object({
    activities: z.string(),
  }),
  execute: async ({ inputData }) => {
    const prompt = `Weather in ${inputData.location}: ${inputData.temperature}°C...`;
    const response = await activityPlanner.generate(prompt);
    return { activities: response.text };
  },
});

// The Pipeline
export const activityPlannerWorkflow = createWorkflow({
  id: 'activity-planner',
  inputSchema: z.object({ city: z.string() }),
  outputSchema: z.object({ activities: z.string() }),
})
  .then(fetchWeather)
  .then(planActivities);

activityPlannerWorkflow.commit();

LLM 从未触及天气 API。它只接收真实数据作为输入，然后做它真正擅长的事：给出情境化的建议。如果把这一步反过来，让代理去抓取天气数据，最终你会得到晴天的预报，即使实际上在下雨。

何时考虑使用工作流：

你拥有必须按顺序执行的已知步骤序列
需要在每个阶段拥有可观测性（日志、指标、时延）
需要对不可靠的外部 API 实现重试逻辑
业务规则不能被“解释”，必须严格遵循

没人提及的上下文窗口问题

我经常看到这种模式。有人搭建了一个聊天机器人，测试时运行良好。结果投产后，用户的对话更长，机器人突然失去方向。

开发者查看日志，发现他们在每次请求时都把完整的对话历史（全部 47 条消息）一起发送。于是大量 token 和上下文空间被用于大多无关的信息。

更糟的是，研究人员观察到一种称为“中段丢失”的现象：当相关信息埋在长上下文中时，模型的表现会下降。模型字面上看不到“森林”，只看到“树”。

发送完整的对话历史看似安全。你把“所有信息”都交给模型。但实际上，这只会让模型更难聚焦于关键内容。

工作记忆 vs. 长期存储

Mastra 的记忆系统同时提供这两者。工作记忆在上下文窗口中保留最近的消息。语义检索在当前查询似乎相关时搜索历史消息。

import { Agent } from '@mastra/core/agent';
import { Memory } from '@mastra/memory';
import { LibSQLStore } from '@mastra/libsql';

export const memoryAgent = new Agent({
  id: 'memory-agent',
  name: 'Memory Agent',
  instructions: 'You are a helpful assistant with perfect recall of our conversations.',
  model: openai('gpt-5'),
  memory: new Memory({
    storage: new LibSQLStore({
      id: 'memory-agent-store',
      url: 'file:../mastra.db',
    }),
    options: {
      lastMessages: 20,  // Keep last 20 messages in context
      semanticRecall: {
        enabled: true,  // Use embeddings to find old stuff
        topK: 5,
        threshold: 0.7,
      },
    },
  }),
});

下面展示实际效果。用户问：“上个月你推荐的那家意大利餐厅是哪家？”

如果没有语义检索，代理只能看到最近的 20 条消息。餐厅推荐出现在第 506 条中的第 487 条，已经被丢掉。于是代理会说：“我没有那条信息。”

开启语义检索后：

查询被嵌入为向量：[0.234, -0.567, 0.891, ...]
将该向量与历史消息比较
第 487 条（“我推荐 Trattoria Bella——他们的卡邦尼拉简直太棒了”）得到 0.89 的相似度
这条消息被注入到当前上下文中
代理回复：“我推荐的是 Trattoria Bella。他们的卡邦尼拉是我注意到的亮点。”

代理看似拥有完美记忆，却只占用了极小的上下文窗口。这不仅是巧妙的工程实现——在对话超过数十条时，这实际上是功能上的必需。

通过代理网络进行协同

有时你既需要结构化又需要灵活性。纯工作流太死板，纯代理又太不可预测。

代理网络提供一个协调者，根据任务决定调用哪个专用代理或工作流。可以把它想成 AI 能力的智能负载均衡器。

export const coordinatorAgent = new Agent({
  id: 'coordinator-agent',
  name: 'Research Coordinator',
  instructions: `You are a network of researchers and writers.
    - Use researchAgent for gathering facts
    - Use writingAgent for producing final content
    - Use weatherTool for current weather data
    - Use activityPlannerWorkflow for location-based planning

    Always produce comprehensive, well-structured responses.`,
  model: openai('gpt-5'),

  // Available primitives
  agents: { researchAgent, writingAgent },
  workflows: { activityPlannerWorkflow },
  tools: { weatherTool },

  // Network requires memory
  memory: new Memory({
    storage: new LibSQLStore({ id: 'network-store', url: 'file:../network.db' }),
  }),
});

当你向该网络发起查询时，协调者会分析请求并进行路由：

“我需要 X 的事实”触发 researchAgent
“为西雅图周末制定计划”运行 activityPlannerWorkflow
“写一篇关于 Y 的报告”调用 writingAgent

这种模式比把所有功能塞进单一巨型代理要更易扩展。专用代理能够培养聚焦的专业能力，协调者负责路由，每个组件都发挥所长。

综合运用

真实的生产 AI 系统需要架构，而不仅仅是提示词。你在构建的是分布式系统，其中一些节点恰好是 LLM。

工作流在需要严格顺序时提供保证。记忆系统在不耗尽 token 预算的前提下提供上下文。代理网络则让你可以从更简单的部件组合出复杂的功能。

这并不光鲜亮丽。但在看到足够多的“全自动代理”在生产环境中失效后，我开始更欣赏乏味的可靠性，而不是令人兴奋的不可预测性。

你的体验可能不同，但根据我的经验，真正能够交付并持续运行的系统，是把 LLM 当作更大架构中的组件，而不是解决一切问题的魔盒。

资源

阅读系列文章

LLM 路由
安全与防护栏
MCP 与工具集成
工作流与记忆（本文）