Episode 8: Inside the AI Agent Revolution
Season 1 Nov 20, 2025

Episode 8: Inside the AI Agent Revolution

00:26:07 23.92 MB

About this Episode

In this episode, Holly and Ewan explore one of the most hyped (yet deeply misunderstood) topics in AI today: AI agents. Holly opens with the big question: What actually is an AI agent?

Ewan explains why definitions vary wildly, but broadly defines an AI agent as any system that can operate independently on your behalf to complete tasks. That could be a coaching assistant, a financial helper, or even a household or education agent.

Ewan shares real-world stories, such as trying to buy a dishwasher using ChatGPT Agent Mode... Only to find that Amazon actively blocks agent-based access.

When he switched to AO.com, the agent succeeded instantly - a perfect illustration of today’s fragmented ecosystem.

He also discusses experimenting with agents to manage LinkedIn connection acceptance, with mixed results, highlighting how even simple point-solution tasks can quickly fall apart.

The discussion then moves into the wider implications:

  • Why agents are transformational in theory, but fragile and unreliable today

  • How browser-based agents actually work using “computer use” screenshot loops

  • Why traditional RPA (Robotic Process Automation) remains far safer and more predictable

  • Early signs of agent-powered cyberattacks, referencing the first reported case of agentic hacking

  • The Carnegie Mellon “Agent Company” benchmark, which evaluates how well different agents perform real office tasks. With current leaderboards showing DeepSeek’s Matrix agent at ~43%, Google Gemini around 41%, and Claude Sonnet 4 around 33%.

The conclusion? The vision is exciting, but today’s agents are nowhere near enterprise-ready. Expect rapid evolution, more experiments, and many more failures as this technology matures.

If you've got feedback, we'd love to hear it. We reply to every single message! Find us at ⁠Working On It Podcast⁠, or follow our ⁠LinkedIn Page⁠. Or talk to ⁠Holly⁠ or ⁠Ewan⁠ on LinkedIn.

Share this Episode