Anthropic suggests slowing AI research until we can align it with human goals - Computerworld - News Bunkers

AI could soon lead to systems capable of improving their own performance faster than humans can effectively supervise them, reviving concerns about the industry’s longstanding “alignment problem,” ensuring AI systems reliably pursue human goals, senior Anthropic researchers have warned in a new blog post titled “When AI builds itself.”
Anthropic Institute lead Marina Favaro and Anthropic co-founder Jack Clark outlined three possible futures: growth in AI capabilities may flatten out; AI efficiency gains may continue to grow, but expose bottlenecks elsewhere in software development; or AI systems may become capable of full recursive self-improvement, and build their successors by themselves. It’s that third scenario that’s prompting them to suggest society be ready to hit the brakes on AI development.
“How the alignment problem gets solved — or not — in this future is something we are least certain about,” they wrote. Advanced, self-improving models could follow our needs and wants — or, they warned, “The rare occurrences of misalignment present in today’s models could compound as the models build their successors, growing more frequent but less understood until we lose control of them. It’s possible that we can’t build, integrate, and verify the tools that we’d need to understand which trendline we are actually on.”
While Anthropic’s warning is framed around future AI development, analysts say it highlights governance questions enterprises are already beginning to confront as autonomous AI agents move from answering questions to taking actions.
“The issue is no longer just whether AI gives the right answer, but whether autonomous systems take the right action, at the right time, within the right authority,” said Ashish Banerjee, senior principal analyst at Gartner.
The warning comes amid growing enterprise investment in agentic AI.
Gartner predicts that by 2028, 15% of day-to-day work decisions will be made autonomously through agentic AI and that one-third of enterprise software applications will incorporate agentic AI capabilities. The firm has also warned that governance shortcomings are already emerging, predicting that 40% of enterprises will demote or decommission autonomous AI agents by 2027 after governance failures become apparent in production environments.
Banerjee said many organizations continue to approach AI agents as advanced productivity tools when they increasingly resemble digital workers operating with delegated authority.
“CIOs should stop treating AI agents as smarter chatbots,” he said. “They are becoming digital workers with delegated authority — and must be governed like privileged users, not productivity tools.”
As agents gain the ability to conduct research, write code, invoke tools, trigger workflows, and make recommendations, enterprises face new risks around unauthorized actions, accountability gaps, data exposure, tool misuse, and insufficient auditability, Banerjee said.
“Human-in-the-loop is not a strategy if the human cannot keep up with the loop,” he said.
Charlie Dai, vice president and principal analyst at Forrester, said Anthropic’s concerns mirror challenges enterprises are already encountering as AI systems gain greater autonomy.
“Alignment becomes operational,” Dai said. “It is about ensuring agents consistently act within policy, not just model accuracy.”
Current governance approaches focus largely on models and data, but increasingly autonomous agents require oversight of runtime behavior, permissions, tool usage, and decision boundaries, Dai said.
Concerns about agent oversight are not limited to AI vendors and industry analysts.
In AI Agent Governance: A Field Guide, researchers from Institute for AI Policy and Strategy warned that “society is largely unprepared for this development” and said “the exploration of agent governance questions and the development of associated interventions remain in their infancy.” The paper argues that advances in autonomous AI agents are outpacing the governance mechanisms needed to oversee them.
Both analysts argued that governance frameworks originally designed for generative AI models may prove insufficient for increasingly autonomous systems. Dai said organizations will need greater oversight of runtime behavior, permissions, tool usage, and decision boundaries as agents become more capable.
Anthropic’s researchers argue that those governance questions could become significantly harder if AI systems become increasingly involved in the process of AI research and development itself.
Favaro and Clark stopped short of predicting that fully autonomous recursive self-improvement is inevitable. Instead, they argued that the possibility warrants preparation and discussion among developers, policymakers, and other stakeholders. They also suggested the industry may eventually need mechanisms to slow development if capabilities begin advancing faster than safeguards, while acknowledging that such measures carry risks of their own.
“But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe,” they wrote in the blog post.
Forrester’s Dai said the practical implication for enterprises is that governance can no longer depend primarily on human review.
“Supervision becomes architectural, not manual,” he said. Organizations will increasingly need bounded autonomy, embedded guardrails, verifiable execution mechanisms, and fallback controls designed into agentic systems from the outset.
Gyana Swain is a seasoned technology journalist with over 20 years’ experience covering the telecom and IT space. He is a consulting editor with VARINDIA and earlier in his career, he held editorial positions at CyberMedia, PTI, 9dot9 Media, and Dennis Publishing. A published author of two books, he combines industry insight with narrative depth. Outside of work, he’s a keen traveler and cricket enthusiast. He earned a B.S. degree from Utkal University.

source

Anthropic suggests slowing AI research until we can align it with human goals – Computerworld

Leave a Reply Cancel Reply