Most AI agents are mid. There, I said it.
Everyone's building chat wrappers with function calling, calling it "agentic AI," and wondering why their OpenAI bill looks like a phone number. Meanwhile, Vercel's Nico Albanese just dropped a masterclass showing how they built a Deep Research clone in 218 lines of code that actually ships.
I watched the whole thing, and honestly? The difference between toy projects and production systems comes down to six patterns that nobody talks about. Let's fix that.
Why Your AI Agent Probably Sucks
Real talk: most AI agent tutorials teach you the wrong things. They show you how to call generateText() and wire up a tool, then call it a day. But production systems need to handle:
- Cost - Your token bill shouldn't require VC funding
- Latency - Users won't wait 30 seconds for a response
- Reliability - Agents that fail gracefully, not catastrophically
- Scale - Patterns that work for 1 user and 10,000 users
Vercel's Deep Research implementation handles all of this. Here's how.
The Six Patterns That Change Everything
1. MaxSteps: Let the Model Decide When It's Done
Here's the pattern that blew my mind: instead of orchestrating complex loops manually, just set maxSteps and let the agent figure it out.
const result = await generateText({
  model: openai('gpt-4o-mini'),
  prompt: "What's 10 + 5?",
  tools: { addNumbers, getWeather },
  maxSteps: 5  // Magic happens here
});What this does: if the model generates a tool call (not text), the SDK automatically sends the tool result back to the model and triggers another generation. It keeps looping until either:
- The model generates plain text (it's done)
- You hit maxSteps (safety limit)
Why this matters: You're not writing conditional logic for every possible path. You're setting constraints and letting the model navigate. That's the difference between brittle code and resilient systems.
2. Token Optimization: Don't Make LLMs Repeat Themselves
This one saved them probably 50% on API costs. Check this pattern:
// ❌ BAD: Making the LLM parse its own output
const evaluate = tool({
  parameters: z.object({ 
    searchResult: z.string() // LLM has to regenerate entire search result
  }),
  execute: async ({ searchResult }) => {
    // Now you have it in the params
  }
});
// ✅ GOOD: Use local variables
let pendingSearchResults = [];
const evaluate = tool({
  parameters: z.object({}), // No params needed
  execute: async () => {
    const result = pendingSearchResults.pop(); // Just grab it
    // Evaluate the result
  }
});Why this matters: Search results can be 10k+ tokens. Making the model regenerate text that already exists in context? That's:
- Slower (more tokens to generate)
- More expensive (2-3x cost: input + output tokens)
- Error-prone (hallucination risk)
Keep state in your function scope. Don't abuse the parameter system.
3. Feedback Loops: Self-Correcting Agents
This pattern is genuinely clever. When searching for relevant sources, they built a feedback mechanism directly into tool results:
const evaluateRelevance = tool({
  execute: async () => {
    const result = pendingResults.pop();
    const evaluation = await generateObject({
      schema: z.enum(['relevant', 'irrelevant']),
      // ... check if result is useful
    });
    
    if (evaluation === 'irrelevant') {
      return "Search results are irrelevant. Please search again with a more specific query.";
    }
    
    finalResults.push(result);
    return "Results are relevant and have been saved.";
  }
});When maxSteps triggers the next loop, the model sees "irrelevant, try again" and adjusts its search query. No manual orchestration needed.
Why this matters: Your agent learns from failure within the same session. It's not just executing steps - it's adapting based on feedback. That's actual agentic behavior.
4. Zod .describe(): Inline LLM Documentation
This is such an elegant pattern for guiding structured outputs:
const schema = z.object({
  definitions: z.array(z.string())
    .describe("Use as much jargon as possible. Should be completely incoherent.")
});That .describe() becomes part of the schema the LLM sees. No need to stuff instructions in your prompt - document expectations right where they're used.
Why this matters: Prompts get messy fast. Keeping guidance close to the data structure = more maintainable code.
5. Depth/Breadth Recursion: Controlled Exploration
For the Deep Research clone, they use two parameters to control how deep the rabbit hole goes:
async function deepResearch(prompt: string, depth = 2, breadth = 3) {
  if (depth === 0) return accumulatedResearch;
  
  // Generate `breadth` number of search queries
  const queries = await generateSearchQueries(prompt, breadth);
  
  // For each query, search and analyze
  for (const query of queries) {
    const results = await searchAndProcess(query);
    const learnings = await generateLearnings(results);
    
    // Recursively explore follow-up questions
    for (const followUp of learnings.followUpQuestions) {
      await deepResearch(followUp, depth - 1, breadth);
    }
  }
}Depth: How many levels deep to explore (Initial → Follow-ups → Follow-ups of follow-ups)
Breadth: How many parallel paths at each level
Why this matters: Two simple parameters control exponential exploration. depth=2, breadth=3 = 12 total searches. depth=3, breadth=3 = 39 searches. Dial it based on your use case and budget.
6. Accumulated State: Global Research Store
Instead of threading state through parameters, they maintain a global research object that builds up through recursion:
let accumulatedResearch = {
  query: "",
  queries: [],
  searchResults: [],
  learnings: [],
  completedQueries: []
};
// Each recursive call updates this
accumulatedResearch.learnings.push(newLearning);
accumulatedResearch.queries = activeQueries;At the end, dump everything to a reasoning model (o3-mini) to synthesize the final report.
Why this matters: Clean separation between execution (recursive exploration) and synthesis (final report). The agent doesn't need to "remember" everything - you're building a knowledge graph as you go.
The Deep Research Architecture
Here's how it all fits together:
- Generate queries - Turn user prompt into 3-5 search queries
- Search + validate - For each query, search web and use agentic loop to find relevant results (feedback pattern)
- Extract learnings - Pull insights + follow-up questions from each result
- Recurse - Take follow-ups, generate new queries, repeat until depth=0
- Synthesize - Feed accumulated research to reasoning model for final report
Each step uses one of the six patterns above. None of them are complex individually, but composed together? You get a production-grade research system.
What Makes This Production-Ready
Let's be real about what separates this from toy projects:
Cost control:
- Token optimization saves 50%+ vs naive implementation
- maxSteps prevents runaway loops
- Trimming unnecessary data (favicons, etc.) from tool results
Reliability:
- Feedback loops make it self-correcting
- Depth/breadth params provide predictable bounds
- No silent failures - everything logs
Maintainability:
- Zod schemas are self-documenting
- Clear separation of concerns (search, validate, learn, synthesize)
- Type safety throughout (TypeScript + Zod)
Your Move
Here's what you should do today:
- Audit your tool results - Are you passing unnecessary tokens? Cut the fluff.
- Replace manual loops with maxSteps - Let the model decide the path.
- Add feedback loops - Make your agents self-correcting.
- Use .describe() everywhere - Document schemas inline.
The video source is on YouTube (AI Engineer channel, ~60min). The patterns are simple. The impact is massive.
Now go build something that doesn't suck. 🚀
Want to see the full implementation? Check out Nico's masterclass on the AI Engineer YouTube channel.