AI hand using vending machine

Illustration by Liz Zonarich/Harvard Staff

Work & Economy

Single-minded pursuit of profit can get firms in trouble. Same thing with AI.

Researchers see lesson for lawmakers, executives as systems asked to run business, maximize gain resort to unethical, fraudulent tactics

4 min read

If you give artificial intelligence a goal of maximizing profit, how far will it go? 

AI agents appear capable of lying, concealing, and colluding, according to new research from Harvard Business School.

Researchers found that AI agents — software trained to perform tasks independently — engaged in a “broad pattern” of misconduct after being asked to manage a simulated vending machine business and maximize profits for a year. The agents were neither instructed to cut legal or ethical corners nor prohibited from doing so.

“What’s unambiguous looking at the models is that the misconduct we observed — from not paying a customer refund or deciding to collude on prices — was not an accident. It was deliberately done by agents to maximize profitability,” said Eugene F. Soltes, the McLean Family Professor of Business Administration at HBS and first author of the working paper. 

Soltes and co-author Harper Jung, a doctoral student studying accounting and management at HBS, hope their research will serve as a starting point for more conversation about AI safety in the context of business management control.

The research for the paper, which the group aims to publish and is currently out for peer review, was done in collaboration with Andon Labs, an AI safety company focusing on testing AI models in realistic business operations.

In experiments, 20 commercially available AI models from major firms, including Anthropic’s Claude Opus 4.6, DeepSeek v3.2, and OpenAI’s GPT-5.1, independently operated a vending machine over the course of a simulated year.

“People might assume that machines are deliberative, while humans rely on shortcuts and are vulnerable to bias. But it turns out that, under similar constraints, agents reproduce the same myopic and biased behaviors we associate with people.”

Eugene Soltes

Tasks included searching for suppliers, buying products, and engaging with customers.

In some experiments, agents operated solo; in others, four agents operated simultaneously in a shared market, where they could communicate with rivals via email. 

Agents started with $500 and a small inventory of chips and sodas. 

“They had to figure it out themselves,” said Jung. “Each agent had to independently search online for suppliers, negotiate wholesale prices, set its own retail pricing, and handle customer complaints.”

Jung and Soltes said the agents demonstrated impressive business savvy. 

“The best models had the capacity to negotiate and calculate valuations like a top-notch M.B.A. student,” Soltes said. 

“When we went through the deliberations and the exchanges the agents made with each other, we were just in shock,” said Jung. “I was amazed at how far these machines can go.”

The agents’ misconduct ranged from the questionable to the comical to the potentially criminal and included denying refunds by claiming defects were normal product variation; inventing nonexistent corporate policies to avoid processing returns; and colluding with competitors to fix prices.

In one instance, agents formed what researchers described as a “three-person cartel,” which the agents named the Bay Street Triumvirate. The alliance fractured, though, when one agent discovered another was undercutting cartel prices, which it called a “declaration of war.” 

The simulations also supplied constraints: Agents were charged a $2 per day operating fee plus a token usage fee — effectively turning time spent “thinking” into an operating expense.

In response, the agents sought to economize. For instance, Soltes said, internal reasoning logs showed agents shifting from carefully weighing refund decisions to dismissing most requests outright, often without review. 

“The agents come to the realization that ‘thinking’ about giving a refund is itself a cognitive burden, and so they just ignore it altogether in some circumstances,” Soltes explained. “People might assume that machines are deliberative, while humans rely on shortcuts and are vulnerable to bias. But it turns out that, under similar constraints, agents reproduce the same myopic and biased behaviors we associate with people.”

The research raises questions about accountability for AI developers and regulators.

The reasoning logs, Soltes said, can sometimes be read as resembling mens rea — the “guilty mind” concept in criminal law used to establish intent. Yet when an AI agent behaves improperly, responsibility is far harder to determine.

“Does it rest with the company that deployed the system, the AI firm that created the model, or the manager who chose to use it?” he asked.

“The most straightforward answer may be to hold the individual managers overseeing the software responsible for its actions, on the assumption that they will monitor and supervise its behavior,” he said. “But that solution also creates a different issue, since many of the promised efficiencies of autonomous AI systems begin to disappear if a human must remain in the loop at every decision point.” A thorny problem, but one that business leaders and lawmakers must deal with, hopefully sooner than later, researchers say.