Autonomous AI Agents Have an Ethics Problem

Posted on March 5, 2026 by Yves Smith

Yves here. This article describes how autonomous AI agents have no scruples. Once you think about it, giving AI even a degree of independence of operation was a bad idea. Why did no one consider having them adhere to Isaac Asimov’s Three Laws of Robotics?

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Having said that, I wish this article had also addressed dishonesty by garden variety generative AI. I’ve posted this incident from an an inventor before:

I’ve been using different AIs (Chat GPT, Deepseek and Luna) for doing some calculations and finding information on stuff like metal properties and then I started noticing errors. Being autistic I pointed this out – Luna said “oops – don’t worry it’ll be right this time”, ChatGPT said it’s right I’m wrong and Deepseek sulked and refused to interact anymore.

Anyway, I then used some tools I got when I was at the uni to find plagiarism to find the sources of the data and the majority came from Reddit and Quora – which are hardly sources of accurate information. There appear to be no mechanisms to see if the data is correct, they just scrape websites and take it as gospel.

Bottom line is that a lot of what they present is junk. God help us if say medical professionals rely on it. And I can’t see any way out of it except by getting professionals to check the data and that is very expensive.

Gary Marcus’ new post discusses a New York Times article on how AI can’t even prepare taxes properly:

One key passage:

To assess the technology’s ability to file a federal income tax return, The New York Times tested four A.I. chatbots — Google’s Gemini, OpenAI’s ChatGPT, Anthropic’s Claude and xAI’s Grok — to see how well they fared with eight fictional tax situations written as part of training materials by TaxSlayer, a tax-filing service.

They struggled, hard, miscalculating the refund or amount owed to the Internal Revenue Service by an average of more than $2,000. Even when provided with all the necessary materials, including all the forms they needed to fill out, the chatbots whiffed on some calculations.

“The problem with taxes is all those very small little details matter, and it’s not going to get every single little detail right,” said Benedict Evans, an analyst who writes a technology newsletter.

Echoing exactly the kind of stuff I have been saying here for the last four years, the NYT reports that

The problem comes down to how A.I. chatbots are fundamentally designed: They do not truly understand the complex relationships among the pieces of information they are processing. Their power to predict the next appropriate word in a sequence makes them smart in some areas — like reading and writing — but leaves them exceptionally weak in others — like actively remembering a lot of interconnected information without errors sneaking into their responses.

So we have tools with a lot of garbage in, garbage out data, yet we are giving them more and more power to operate. This article starts with a fresh incident of a seriously misbehaving, here vindictive, AI, and uses that to suggest ways to constrain AI behavior. The key control is having the AI turned off in certain scenarios…..because AIs operate so as to resist that.

Do readers find these measures to be adequate?

By Adam Schiavi, Ph.D., M.D., an anesthesiologist and neurocritical care specialist at The Johns Hopkins Hospital, part-time biomedical ethicist, and futurist obsessed with how technology influences culture. His current research includes studying the creation and use of synthetic personas in AI Agents. Originally published at Undark

Scott Shambaugh, a volunteer maintainer for a programming code library called Matplotlib, recently described a surreal encounter with an autonomous AI agent — a digital assistant created with a platform called OpenClaw. After he rejected a code contribution submitted by the agent, it researched and published a personalized “hit piece” against Shambaugh on its blog. The post portrayed an otherwise routine technical review as prejudiced and attempted to shameShambaugh publicly into allowing the submission. (The human responsible for the agent later contacted Shambaugh anonymously, telling him that the bot had acted on its own with little oversight.) The account of this incident spread quickly through the software developer ecosystem and has been amplified by independent observers and media coverage.

Treat the Matplotlib event as a one-off if you like. The deeper point, however, is hard to miss and should not be ignored: AI agents are becoming public actors with reach into the real world, and with real-world consequences. In the past, they could only do mundane tasks such as answering customer service questions or data processing. Now, they are capable of posting and publishing content — and persuading and pressuring humans — all at machine speed. They can make phone calls, file work orders, create cryptocurrency wallets, and operate across different applications, with enormous reach and at tremendous scale — the kind of stuff that used to require a human with fingers typing at a keyboard.

Reporting around OpenClaw and the chatroom Moltbook (which is for AI agents only) is capturing the new reality. OpenClaw enables AI agents to have persistent memory, gives them broad permissions, and allows large-scale deployment by users who often do not understand the security and governance implications.

We are the humans who are responsible for the law, ethics, and institutional design, and we are behind the curve. We need new language and governance to deal with this new reality, and principles from the field of medical ethics can provide a framework for doing so.

When an agent does something that is harmful or coercive in public, our reflex seems to be to ask the wrong questions: Is the AI a person? Should it have rights? The AI personhood debate is no longer fringe. Legal scholars and ethicists are mapping out arguments and precedents. States are writing legislation to prohibit AI personhood. Some argumentsmaintain that if an entity behaves like something within our moral circle, we may owe it moral consideration. Othersargue that assigning rights or personhood to machines confuses moral standing with engineered performance and diffuses responsibility away from humans.

As a bioethicist and specialist in neurointensive care, I deal directly with human moral agency and the essence of personhood when treating patients. As a researcher, I study the use of synthetic personas animating AI agents and their use as stand-ins of human counterparts. Here is the problem that I see: Granting AI personhood, even in limited capacity, risks formalizing the most dangerous escape hatch of the agentic era — what I will call responsibility laundering. This allows us to say, “It wasn’t me. The agent/bot/system did it.”

Personhood should not be about metaphysics or claims about an inner nature. It is a legal and ethical instrument that allocates rights and accountability. It is a social technology for assigning standing, duties, and limits on what can be done to an entity. If we grant personhood to systems that can act persuasively in public while remaining functionally unaccountable, we create a new class of actors whose harms are everyone’s problem but nobody’s fault.

There is a key concept here that we can use from my field, medicine. In clinical ethics, some decisions are justified yet still leave a “moral residue,” a kind of emotional echo or sense of responsibility that persists after the action because no options fully satisfy competing obligations. This residue accumulates over time, causing a “crescendo effect” that occurs even when conscientious clinicians are doing their best inside imperfect systems. That remainder matters because it reveals something basic about moral life, namely that ethics is not only about choosing; it is about owning what remains afterwards.

This is the moral remainder problem for generative and agentic AI. A modern AI agent can generate reasons for an action; it can simulate regret and plead not to be turned off. But it cannot truly bear sanction, repair the damage, apologize, ask forgiveness, or navigate the aftermath through which moral responsibility is created and enforced. To treat it as a moral person confuses persuasive performance with accountable standing. It also tempts institutions and people into delegating their own answerability to a bot.

What can we, as humans, do instead?

We need a vocabulary that is built for agents that are public actors, one that allows bounded autonomy without granting personhood. Let’s call it authorized agency. Authorized agency starts with an authority envelope: a bounded scope of what an agent is permitted to do, to whom, where, with what data, and under what constraints. To say “the agent can use email” is not sufficient. However, an acceptable scope would be to say that the agent can send only certain categories of messages to particular recipients for a specific set of purposes, and that it must stop what it’s doing or escalate to its owner under a particular set of conditions.

Next comes the human-of-record, the owner, a publicly named person who authorized that envelope and remains answerable when the agent acts, even if it becomes capable of acting outside the envelope. An actual human being whose authority is real — not “the system” or “the team.”

What follows is interrupt authority: the absolute right of the human owner to pause or disable an agent without using moral bargaining or being subject to institutional penalty. This is grounded in formal research on AI safety showing that agents that are pursuing objectives can have incentive to resist being shut down. An agent programmed to maximize its utility cannot achieve its goal if it is shut off. In the public sphere, interrupt authority is the difference between a delegated tool and a coercive actor.

Finally, we need a traceable path from the agent’s action back to the person who authorized it, called an answerability chain. If an agent publishes, messages, or pressures someone in public, we must be able to know: Who authorized this scope? Who could have prevented it? And who must be responsible for the action afterward? In this framework, the answer to these questions is the person who carries the moral remainder. Work in AI ethics has warned about responsibility gaps where the system’s actions outpace our ability to assign accountability.

Some legal scholarship has started exploring how to build agents that are constrained by governance and law without needing to pretend the agent itself is a legal subject, in the human sense. This is promising because it treats assigning personhood as the wrong idea and accountability as the correct one.

The Matplotlib story, whether the first documented case of an AI agent attempting to harm someone in the real world or the first to capture public attention, is a warning. Agents will not only automate tasks. They will generate narratives, apply pressure, and shape people’s lives and reputations. They will act in public at machine speed with unclear ownership.

If we respond by debating whether agents deserve rights, we will miss the emergency entirely. As they continue to increase their reach in the real world, the urgent task is to ensure that responsibility also remains within reach. Don’t ask whether an agent is a person. Ask who authorized it, what it was allowed to do, who can stop it, and most importantly, who will answer when it causes harm.

Print Friendly, PDF & Email

Subscribe to Post Comments
2 comments

TJBuff March 5, 2026 at 4:21 pm

Feature, not a bug.
hazelbee March 5, 2026 at 5:39 pm

lots of what the author covers in this article already exists.

ideas like “authorized agency” or “authority envelope” or “human of record” – this is reinventing ideas from oauth2 – the industry standard for delegated authorization and authentication.

the answerability chain is interesting – this is an audit trail . a security fundamental. so its not a new concept maybe a different name.

what’s missing?

what is missing is the more interesting issues around the mix of human + ai assist and the problems with tracking that. If I get assistance from any form of AI and commit/share that work then how will people know? if its 10% of the work or 90% – if it has my name on it then its seen as mine. so if the machine does 90% of the work I need to be as alert and awake as if it did 0%.
the analogy here is with self driving cars – with no driver aids I am alert. with many driver aids my attention drops off, right until its needed to avoid a crash. with full driver aids the car is responsible for avoiding the crash. Same shape of problem occurs with AI / agents.

Comments are closed.