Building AI Agents That Actually Work: 6 Production Patterns from Vercel's Deep Research

Most AI agents are mid. There, I said it.

Everyone's building chat wrappers with function calling, calling it "agentic AI," and wondering why their OpenAI bill looks like a phone number. Meanwhile, Vercel's Nico Albanese just dropped a masterclass showing how they built a Deep Research clone in 218 lines of code that actually ships.

I watched the whole thing, and honestly? The difference between toy projects and production systems comes down to six patterns that nobody talks about. Let's fix that.

Why Your AI Agent Probably Sucks

Real talk: most AI agent tutorials teach you the wrong things. They show you how to call generateText() and wire up a tool, then call it a day. But production systems need to handle:

Cost - Your token bill shouldn't require VC funding
Latency - Users won't wait 30 seconds for a response
Reliability - Agents that fail gracefully, not catastrophically
Scale - Patterns that work for 1 user and 10,000 users

Vercel's Deep Research implementation handles all of this. Here's how.

The Six Patterns That Change Everything

1. MaxSteps: Let the Model Decide When It's Done

Here's the pattern that blew my mind: instead of orchestrating complex loops manually, just set maxSteps and let the agent figure it out.

const result = await generateText({
  model: openai('gpt-4o-mini'),
  prompt: "What's 10 + 5?",
  tools: { addNumbers, getWeather },
  maxSteps: 5  // Magic happens here
});

What this does: if the model generates a tool call (not text), the SDK automatically sends the tool result back to the model and triggers another generation. It keeps looping until either:

The model generates plain text (it's done)
You hit maxSteps (safety limit)

Why this matters: You're not writing conditional logic for every possible path. You're setting constraints and letting the model navigate. That's the difference between brittle code and resilient systems.

2. Token Optimization: Don't Make LLMs Repeat Themselves

This one saved them probably 50% on API costs. Check this pattern:

// ❌ BAD: Making the LLM parse its own output
const evaluate = tool({
  parameters: z.object({ 
    searchResult: z.string() // LLM has to regenerate entire search result
  }),
  execute: async ({ searchResult }) => {
    // Now you have it in the params
  }
});

// ✅ GOOD: Use local variables
let pendingSearchResults = [];

const evaluate = tool({
  parameters: z.object({}), // No params needed
  execute: async () => {
    const result = pendingSearchResults.pop(); // Just grab it
    // Evaluate the result
  }
});

Why this matters: Search results can be 10k+ tokens. Making the model regenerate text that already exists in context? That's:

Slower (more tokens to generate)
More expensive (2-3x cost: input + output tokens)
Error-prone (hallucination risk)

Keep state in your function scope. Don't abuse the parameter system.

3. Feedback Loops: Self-Correcting Agents

This pattern is genuinely clever. When searching for relevant sources, they built a feedback mechanism directly into tool results:

const evaluateRelevance = tool({
  execute: async () => {
    const result = pendingResults.pop();
    const evaluation = await generateObject({
      schema: z.enum(['relevant', 'irrelevant']),
      // ... check if result is useful
    });
    
    if (evaluation === 'irrelevant') {
      return "Search results are irrelevant. Please search again with a more specific query.";
    }
    
    finalResults.push(result);
    return "Results are relevant and have been saved.";
  }
});

When maxSteps triggers the next loop, the model sees "irrelevant, try again" and adjusts its search query. No manual orchestration needed.

Why this matters: Your agent learns from failure within the same session. It's not just executing steps - it's adapting based on feedback. That's actual agentic behavior.

4. Zod .describe(): Inline LLM Documentation

This is such an elegant pattern for guiding structured outputs:

const schema = z.object({
  definitions: z.array(z.string())
    .describe("Use as much jargon as possible. Should be completely incoherent.")
});

That .describe() becomes part of the schema the LLM sees. No need to stuff instructions in your prompt - document expectations right where they're used.

Why this matters: Prompts get messy fast. Keeping guidance close to the data structure = more maintainable code.

5. Depth/Breadth Recursion: Controlled Exploration

For the Deep Research clone, they use two parameters to control how deep the rabbit hole goes:

async function deepResearch(prompt: string, depth = 2, breadth = 3) {
  if (depth === 0) return accumulatedResearch;
  
  // Generate `breadth` number of search queries
  const queries = await generateSearchQueries(prompt, breadth);
  
  // For each query, search and analyze
  for (const query of queries) {
    const results = await searchAndProcess(query);
    const learnings = await generateLearnings(results);
    
    // Recursively explore follow-up questions
    for (const followUp of learnings.followUpQuestions) {
      await deepResearch(followUp, depth - 1, breadth);
    }
  }
}

Depth: How many levels deep to explore (Initial → Follow-ups → Follow-ups of follow-ups)
Breadth: How many parallel paths at each level

Why this matters: Two simple parameters control exponential exploration. depth=2, breadth=3 = 12 total searches. depth=3, breadth=3 = 39 searches. Dial it based on your use case and budget.

6. Accumulated State: Global Research Store

Instead of threading state through parameters, they maintain a global research object that builds up through recursion:

let accumulatedResearch = {
  query: "",
  queries: [],
  searchResults: [],
  learnings: [],
  completedQueries: []
};

// Each recursive call updates this
accumulatedResearch.learnings.push(newLearning);
accumulatedResearch.queries = activeQueries;

At the end, dump everything to a reasoning model (o3-mini) to synthesize the final report.

Why this matters: Clean separation between execution (recursive exploration) and synthesis (final report). The agent doesn't need to "remember" everything - you're building a knowledge graph as you go.

The Deep Research Architecture

Here's how it all fits together:

Generate queries - Turn user prompt into 3-5 search queries
Search + validate - For each query, search web and use agentic loop to find relevant results (feedback pattern)
Extract learnings - Pull insights + follow-up questions from each result
Recurse - Take follow-ups, generate new queries, repeat until depth=0
Synthesize - Feed accumulated research to reasoning model for final report

Each step uses one of the six patterns above. None of them are complex individually, but composed together? You get a production-grade research system.

What Makes This Production-Ready

Let's be real about what separates this from toy projects:

Cost control:

Token optimization saves 50%+ vs naive implementation
maxSteps prevents runaway loops
Trimming unnecessary data (favicons, etc.) from tool results

Reliability:

Feedback loops make it self-correcting
Depth/breadth params provide predictable bounds
No silent failures - everything logs

Maintainability:

Zod schemas are self-documenting
Clear separation of concerns (search, validate, learn, synthesize)
Type safety throughout (TypeScript + Zod)

Your Move

Here's what you should do today:

Audit your tool results - Are you passing unnecessary tokens? Cut the fluff.
Replace manual loops with maxSteps - Let the model decide the path.
Add feedback loops - Make your agents self-correcting.
Use .describe() everywhere - Document schemas inline.

The video source is on YouTube (AI Engineer channel, ~60min). The patterns are simple. The impact is massive.

Now go build something that doesn't suck. 🚀

Want to see the full implementation? Check out Nico's masterclass on the AI Engineer YouTube channel.