Episode 8: Inside the AI Agent Revolution
About this Episode
In this episode, Holly and Ewan explore one of the most hyped (yet deeply misunderstood) topics in AI today: AI agents. Holly opens with the big question: What actually is an AI agent?
Ewan explains why definitions vary wildly, but broadly defines an AI agent as any system that can operate independently on your behalf to complete tasks. That could be a coaching assistant, a financial helper, or even a household or education agent.
Ewan shares real-world stories, such as trying to buy a dishwasher using ChatGPT Agent Mode... Only to find that Amazon actively blocks agent-based access.
When he switched to AO.com, the agent succeeded instantly - a perfect illustration of today’s fragmented ecosystem.
He also discusses experimenting with agents to manage LinkedIn connection acceptance, with mixed results, highlighting how even simple point-solution tasks can quickly fall apart.
The discussion then moves into the wider implications:
Why agents are transformational in theory, but fragile and unreliable today
How browser-based agents actually work using “computer use” screenshot loops
Why traditional RPA (Robotic Process Automation) remains far safer and more predictable
Early signs of agent-powered cyberattacks, referencing the first reported case of agentic hacking
The Carnegie Mellon “Agent Company” benchmark, which evaluates how well different agents perform real office tasks. With current leaderboards showing DeepSeek’s Matrix agent at ~43%, Google Gemini around 41%, and Claude Sonnet 4 around 33%.
The conclusion? The vision is exciting, but today’s agents are nowhere near enterprise-ready. Expect rapid evolution, more experiments, and many more failures as this technology matures.
If you've got feedback, we'd love to hear it. We reply to every single message! Find us at Working On It Podcast, or follow our LinkedIn Page. Or talk to Holly or Ewan on LinkedIn.