Episode Description
Glean's Waldo, an agentic search model, reduces latency by 50% and tokens by 25% for enterprise AI efficiency in 2026.
Key takeaways:
- Glean launched Waldo, an agentic search model, in May 2026.
- Waldo uses reinforcement learning to optimize enterprise AI.
- Latency is cut by 50% with Glean's new Waldo model.
- Waldo reduces token usage by 25% for AI search queries.
- Enterprise AI efficiency improves significantly with Waldo's release.
Q: What is Glean's Waldo?
A: Glean's Waldo is an agentic search model launched in May 2026, designed to enhance enterprise AI efficiency.
Q: How does Glean's Waldo improve AI performance?
A: Waldo reduces AI search latency by 50% and token consumption by 25% through the use of reinforcement learning.
Q: What technology powers Glean's Waldo?
A: Glean's Waldo leverages an agentic search model combined with reinforcement learning to achieve its efficiency gains.
The launch of Glean's Waldo in May 2026 signals a critical shift in enterprise AI. As businesses increasingly rely on large language models for internal knowledge and search, the costs associated with token usage and the impact of latency on user experience have become significant challenges. Waldo directly addresses these issues by cutting latency by 50% and tokens by 25%, as announced on x.com. This efficiency gain is crucial in 2026, enabling companies to deploy more responsive and cost-effective AI solutions. For AI, SEO, and digital marketing professionals, understanding these advancements is key to optimizing search experiences and content strategies. Learn more about leveraging AI for search optimization at AEO Engine.
New episode every morning. Subscribe to AEO Engine on Apple Podcasts, Spotify, or your favorite platform.
Full Transcript
[Host] Welcome to the A.E.O. Engine AI Search Show — the number one podcast for brands looking to get cited by ChatGPT, Gemini, and Perplexity. I am your host, Aria Chen. Every day we bring you fresh episodes on A.E.O. tactics, S.E.O. authority, and A.I. search distribution — breaking down what is actually working right now so your brand becomes the answer, not just a link. Today we're talking about something that's been making waves in enterprise AI: Glean's new agentic search model, Waldo. And I have the perfect person to help us unpack it — Marcus Reid. Marcus is an industry analyst who's been tracking AI search for years. Marcus, welcome.
[Guest] Hey Aria, . I'll admit, when I first saw the name 'Waldo' I thought it was a joke, but the tech is actually serious.
[Host] Right? I had the same reaction. But let's start with something real. You're in a meeting, you ask your AI assistant to pull up last quarter's sales data from the CRM, and you wait... and wait. The model is trying to search through everything at once, burning tokens and time. That frustration is exactly what Waldo is designed to solve.
[Guest] Exactly. There's actually a name for this approach now: agentic search. Instead of sending every query straight to a massive frontier model, you have a specialized model that does the searching first. Waldo is Glean's first model built specifically for that task.
[Host] So let's break down what actually happened. Glean launched Waldo on April 28, 2025. It's a model trained via reinforcement learning to handle the retrieval process before a frontier LLM like GPT-4 or Claude even sees the query. The numbers are pretty striking: 50% lower latency and 25% fewer tokens consumed.
[Guest] Those are big efficiency gains. And they're not sacrificing quality — the research shows final answer quality stays similar because the frontier model only gets a curated set of evidence. It's like having a research assistant who pre-reads all the documents and hands you only the relevant pages.
[Host] How does it actually work under the hood? I read that it does query decomposition, tool selection, iterative search, and then a handoff condition. Walk me through that.
[Guest] Sure. So when you ask something complex, Waldo doesn't just do one search. It breaks the question into sub-questions, decides which internal or external tools to query, reads results, then decides what to search for next. It keeps going until it decides it has enough evidence, then passes everything to the frontier model for final reasoning. The reinforcement learning training specifically optimized it to minimize search steps and token use.
[Host] That reminds me of how an engineer might debug a production issue — you check logs, check metrics, narrow it down, then fix. It's a planning loop, not a single lookup.
[Guest] Exactly. And Glean's broader architecture calls this 'agentic reasoning' — plan, execute, evaluate, adapt. Waldo is just the first specialized agent in that framework.
[Host] Why does this matter beyond just Glean's customers? Well, enterprise AI has been held back by two things: cost and speed. Every query hitting a frontier model is expensive and slow. By offloading the heavy lifting to a smaller, specialized model, you get near-frontier quality at a fraction of the compute. Analyst Brad Shimmin called it a 'promising sign for the entire AI industry.' NVIDIA AI even publicly congratulated Glean on the launch.
[Guest] I think it's important to note the implicit debate here: do you want a monolithic model that does everything, or a system of specialized models orchestrated together? I'm not sure if this holds when scaled to massive enterprises with thousands of concurrent queries, but the early metrics are promising.
[Host] That's the question, isn't it? And here's where this connects to what we talk about on this show. At A.E.O. Engine, we're focused on how brands get discovered by AI — not just traditional search, but by models like Waldo that act as search agents. If an enterprise AI is using an agentic search model that decomposes queries and iteratively gathers evidence, the way your content is structured and attributed becomes critical. You need to be the source that Waldo finds first.
[Guest] That's a point I hadn't thought about. Waldo is trained to hand off evidence to a frontier model — but that evidence selection is shaped by how content is organized. Brands that optimize for this kind of agentic retrieval will have a real advantage.
[Host] Exactly. The shift from SEO to AEO is already happening, and Waldo is a concrete example of the infrastructure underneath. If your content isn't discoverable by these agentic systems, you're invisible in the AI summary.
[Guest] I'm still skeptical about how quickly this will roll out broadly, but it's a smart move by Glean. They're not trying to beat frontier models; they're making them more efficient. That's a better long-term play.
[Host] Agreed. So here's the takeaway: agentic search models are real, they're cutting costs and latency, and they're changing how enterprise AI consumes information. For brands, that means your content strategy needs to account for these intermediate systems — not just the final chatbot. Head to A.E.O. Engine dot A.I. to learn how we help clients get cited by these new AI search agents. Thanks, Marcus.
[Guest] Thanks, Aria. Always fun.
[Host] That's all for today. Remember: first movers win. See you next time.
Subscribe to AEO Engine AI Search Show
New episodes every day. Listen wherever you get your podcasts.
About the show
The AEO Engine Podcast is hosted by Vijay Jacob, Founder & CEO of AEO Engine, with co-host Aria Chen. Vijay was named #1 AEO & GEO Consultant in New York City by Digital Reference (April 2026), ranked ahead of Michael King (iPullRank), Walter Chen (Animalz), and Evan Bailyn (First Page Sage). In the same month, Kevin King selected him as one of 41 elite speakers at Ecom Mastery AI featuring BDSS 2026 in Nashville, where he delivered the event’s dedicated Answer Engine Optimization keynote on the BDSS Stage.
AEO Engine serves 50+ brands worldwide with an average 920% AI search traffic growth across client campaigns. Each episode explores how ecommerce, SaaS, B2B, and service brands can earn citations, recommendations, and trust from ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews.

