What the heck is an AI agent

September 25, 2025 by Nasim Rahaman 10 min read

What the heck is an AI agent?

Remember high school physics? Picture a piston pushing down on gas trapped in a cylinder. When the piston moves, the gas heats up, pressure increases, all sorts of interesting things happen. That's what thermodynamics studies.

But here's the thing: who's pushing the piston? Someone or something must be, but thermodynamics doesn't care. It just says "the piston moves down" and leaves it at that. Whatever is doing the pushing sits outside the system we're studying — a black box that somehow decides when and how to push.

That black box is what we call an agent. It's a subsystem that you don't want to model out. It just magically does things.

As SWEs, we work with this concept all the time. When you call a third-party API, you're treating it as an agent. You know that stripe.charges.create() will process a payment, but you don't model Stripe's internal fraud detection, payment routing, or retry logic. That's their problem. The moment you start caring about those internals — perhaps you're debugging why a payment failed — Stripe stops being an agent and becomes part of the system you're analyzing.

Same thing with users. When we write onClick handlers, we're treating users as agents. We know they'll click buttons, but we don't model their decision-making process. We just handle the events they generate. The user is a black box that somehow converts what they see on screen into clicks, keystrokes, and swipes.

This is the key insight: an agent isn't defined by what it is, but by how we choose to model it. It's any entity that perceives its environment and acts on it in ways we treat as external to our system. We define agents by their interface, not their implementation. The moment we peek inside the black box and start modeling the "how," it ceases to be an agent and becomes just another component in our system.

Designing an Agent

So let's say we agree about what an agent is — a black box that does useful things. But what if you're the one building the black box?

Here's the paradox: as an agent designer, you're crafting something that your users explicitly won't understand the internals of. That's not a bug, it's a feature. Like Stripe's engineers obsess over payment routing algorithms that their users will never think about, your job as an agent designer is to build complexity that disappears behind a simple interface.

Now, the tricky part: how do you design something people can use without understanding? The answer is perhaps surprisingly simple - make it work like things they already know.

Think about it: when email was invented, they didn't create a completely alien way to send messages. They took the familiar mental model of "writing and mailing a letter" and mapped it to the digital world. Users already knew how mail worked: you write a message, put an address on it, send it off, and it arrives at its destination. Email kept all these concepts — To:, From:, Subject: (like the envelope), even CC: (carbon copy from typewriter days). The designers could have built something radically different, but they chose familiarity.

This is why radically new agent designs often fail. Remember the first time you encountered a gesture-based interface with no visible buttons? Or a voice-only UI that responded to commands like "ILLUMINATE KITCHEN" instead of natural language? The learning curve kills adoption. Unless the payoff is huge (like enterprise software that people list on their resumes), users won't invest the effort to understand something genuinely novel. Without familiar patterns to lean on, users struggle to build mental models of what the agent can and can't do.

The most successful agents hide their innovation behind familiar interfaces. Siri and Alexa succeeded not because voice interfaces were new, but because they made them feel like talking to a human assistant: an agent model we've had for millennia. "Hey Siri, turn on the kitchen lights" feels natural because it's how you'd ask a person, not "KITCHEN LIGHTS ACTIVATE" like you're programming a robot.

So the recipe for good agent design boils down to two ingredients:

Make it familiar: Leverage interaction patterns your users already understand
Make it useful: The value must be worth any new behaviors users need to learn

The trick is to make sure the users don't have to care how the agent works — but make sure they're delighted that it does.

The Human API

Humans have been interacting with each other for millennia, and through all that practice, we've developed a rich set of expectations about how these interactions work. Call it the "Human API" — the interface specification that every person implicitly implements.

This API has evolved with technology. Physical mail became email, which became instant messaging, which became Slack and WhatsApp. The surface details changed (envelopes became subject lines, postal addresses became @handles) but the core patterns persisted. We still send messages to people, expect responses, and organize conversations into threads. The API adapted but didn't break.

But the Human API goes much deeper than messaging protocols. We make fundamental assumptions about other humans that shape every interaction:

Persistent identity: Bob today is the same Bob next week. When Bob emails you, texts you, and talks to you in person, it's all the same Bob. His memory persists across conversations. He remembers what you talked about last time. This seems obvious until you realize how many systems get this wrong — like when you have to re-explain your issue every time you reach a new customer service rep.

Continuous existence: Bob doesn't cease to exist when you're not talking to him. He's out there doing Bob things — learning, changing, having experiences that might come up in your next conversation. He's not frozen in stasis waiting for your next interaction.

Multi-party interactions: Bob talks to other people. He might mention something Alice told him, or coordinate plans that involve Charlie. He exists in a social network, not in isolation.

Agency relationships: Bob might work for Alice, or hire Dave to do something. These relationships come with expectations about loyalty, disclosure, and chain of command that we all implicitly understand.

These are some of the foundational assumptions that make human society work. Every time we interact with another person, we're using this API without even thinking about it.

Meatless Agents that Quack like Humans

ChatGPT's breakthrough was that it implemented enough of the Human API to feel familiar. The name itself tells the story: it's "Chat" GPT because you chat with it, like you would with another person. You type a message, it responds in a coherent way (for the most part). The interaction pattern is instantly recognizable.

But current LLMs also live in an uncanny valley. They nail some parts of the Human API while completely missing others. Start a new conversation with ChatGPT, and it has no memory of your previous chats (despite some half-baked attempts at persistent memory). It can't proactively reach out when it learns something relevant to your last conversation. It's like talking to someone with severe anterograde amnesia: coherent in the moment but lacking the continuity that makes relationships meaningful.

The real promise of AI agents is their potential to implement progressively more of the Human API. This means remembering previous interactions, existing continuously between conversations, coordinating with other agents, and maintaining consistent relationships over time. And of course, things will get more physical as we go.

This doesn't mean agents are limited to human capabilities. Just like email added features snail mail doesn’t have (perfect recall, search, instant and mass distribution), agents can extend beyond the Human API. But by implementing that familiar foundation first, they become immediately usable by billions of people who already know how to work with other intelligent entities.

The agents that succeed will be the ones that honor this implicit contract. They will act like the intelligent, persistent, socially-aware entities we expect them to be. Not because that's the only way to build agents, but because that's the interface humans already understand.

The Promised Land

Now that we've established the what, it's time to ask the why. Why would it be a good idea to have trillions of agents that quack like humans?

To answer that, we need to talk about the fundamental problem every economy faces: scarcity. Human wants are unlimited, but resources are limited. This forces every society to answer three brutal questions:

What should we produce?
How should we produce it?
Who gets it?

For all our technological progress, our answers remain tragically inadequate. We produce enough food to feed everyone, yet children die of malnutrition. We have more empty homes than homeless people. We have treatments gathering dust while people die from preventable diseases.

These are coordination failures. The transaction costs are too high. The trust networks too fragmented. The information too siloed. A farmer in Kenya can't efficiently connect with restaurants in Nairobi. A specialist in Toronto can't share expertise with a patient in Mumbai. The coordination overhead crushes the value before it can be created.

But what happens when every person, every organization, every community has access to agents that can maintain thousands of relationships, remember every interaction, coordinate seamlessly across every barrier?

The farmer's agent talks to restaurants in Nairobi—negotiating prices, arranging transport, predicting demand. No middlemen, no rotting crops. The specialist's agent collaborates with the patient's agent in Mumbai, transferring decades of expertise in real-time. When coordination becomes nearly free, we unlock economic networks that were literally impossible before.

This is what makes abundance possible. And this flavor of abundance enables something beautiful: the transformation of human labor into a Veblen good — valuable precisely because a human chose to do it.

Mark Court is the craftsman who hand-paints pinstripes on Rolls-Royce cars. In a world of robotic precision and perfect automation, Rolls-Royce pays him handsomely to do what a machine could do faster and more accurately. Why? Because the human touch, the slight imperfections, the knowledge that another person devoted their skill and attention to this specific car — that's what makes it special.

In the abundant future, everyone who wants to can find their version of being Mark Court. The teacher who teaches because they love watching understanding dawn in a student's eyes. The chef who cooks because food is their art. The carpenter who builds because wood speaks to them. When we're freed from work-to-survive, we can all choose work that matters to us, that uses our unique human creativity, that brings us joy.

This future belongs to everyone. The child born today in a refugee camp deserves to discover whether they're an artist or an engineer or a philosopher. They deserve to find their own equivalent of painting pinstripes: that thing they do because they must, because it's who they are, not because rent is due.

Of course, dropping a trillion agents into our current systems won't automatically create this world. Property rights, power structures, political systems — these persist. Incumbents will try to capture these tools to entrench their positions.

But making coordination essentially free changes what's politically possible. When agents make visible that we could feed everyone for a rounding error, that scarcity is now mostly artificial, the pressure for change becomes overwhelming. It becomes harder to justify letting people starve when the solution is right there, coordinated by agents, waiting for permission.

This is recognition that our biggest problems aren't laws of physics, but coordination failures. And for the first time in history, we have tools that could fix that. What happens next depends on the choices we make. We have a moral imperative to try to make these failures evitable.