AI Week in Review 26.06.27 - Substack - News Bunkers

OpenAI announced the anticipated GPT-5.6 as a suite of 3 models (Sol, Terra, Luna), but they released them as a limited preview for “trusted partners” rather than a full public release. Following pressure from the Trump administration, GPT-5.6 access is restricted to select partners pending a review by Government agencies to develop new AI security framework developments.
GPT-5.6 joins Anthropic’s Mythos 5 and Fable 5 as frontier AI models not accessible to the public on account of its high-level capabilities posing a cyber-security risk. This Government interference is a worrying trend that could greatly hamper AI progress unless and until a smooth framework for safe public AI releases is determined and worked out. OpenAI notes:
We are taking this short-term step because we believe it is the strongest path to broader availability in the coming weeks, while we work with the Administration to develop the cyber Executive Order framework and a repeatable process for future model releases.
The GPT-5.6 series, comprising the flagship Sol model, the balanced Terra model and the lower-cost Luna model, offers enhanced performance in coding, biology, and cybersecurity. For example, GPT-5.6 sets a new state-of the TerminalBench 2.1 coding benchmark, with GPT-5.6 Sol beating Claude Mythos5, GPT-5.6 Terra on the level Claude Fable 5, and GPT-5.6 Luna on the of GPT-5.5.
OpenAI calls GPT-5.6 Sol “our most capable model yet for cybersecurity.” However, they implemented a new safety stack featuring real-time misuse classifiers and trained GPT‑5.6 Sol to recognize cyber vulnerabilities for cyber defenders but not generate exploits. OpenAI claims the model does not cross the Cyber Critical threshold under their AI safety Preparedness Framework⁠.
GPT‑5.6 is priced competitively per 1M tokens across the three model sizes: Sol costs $5 input / $30 output; Terra is $2.50 input / $15 output; and Luna is $1 input / $6 output. GPT‑5.6 also introduces more predictable prompt caching, including explicit cache breakpoints and a 30-minute minimum cache life. Sol adds a “max” reasoning setting and an “ultra” mode that coordinates subagents for complicated work, while Luna is a cost-competitive coding model on par with GPT-5.5 and Claude Opus 4.8 on coding.
GPT-5.6 will be a compelling frontier AI model lineup when it is available via a rollout through ChatGPT, Codex and the API.
Anthropic introduced a feature called Claude Tag that enables users to interact with Claude directly within Slack channels and works proactively instead of only responding when pinged. Claude Tag will break assignments into stages and tackle them off-line before returning completed work to the Slack thread. It can surface relevant information, follow stale threads, and share context at the channel level so teammates can continue work from the same conversation.
Anthropic bills it as “a new way for teams to work with Claude” and Andrej Karpathy called it “the 3rd major redesign of LLM UIUX.” However, the persistent workplace agent model is not unique but applies the OpenClaw agent paradigm within an enterprise context. Unlike a private chatbot or personal OpenClaw, the work-channel agent acts like a co-worker; it can collaborate with multiple people, use internal company tools, and learn and retain relevant organizational context. Claude Tag beta is initially available to Claude Team and Enterprise customers.
Sakana AI has launched an orchestrator model called Sakana Fugu, a multi-agent orchestration system which routes user prompts to the most optimal underlying AI models. One API call routes work across multiple models and roles, combining thinker, worker, and verifier behaviors to improve results. The system balances performance and latency with its standard Fugu version, while a Fugu Ultra variant is designed to handle complex, multi-step problems. Benchmark data shows that the Fugu orchestrator performs on par with or outperforms existing high-tier models like Fable 5 and Mythos in certain tests.
Google has added computer use as a built-in tool in Gemini 3.5 Flash, to support agents that can interact across platforms. The model processes continuous screenshots to execute click, scroll, and typing actions seamlessly across varied software environments. Developers can utilize this feature via the Gemini API and Gemini Enterprise Agent Platform to create agents capable of acting across browser, mobile, and desktop environments.
Liquid AI released its smallest AI language model yet, LFM2.5-230M, touting the tiny 230M parameter LLM as ‘built to run anywhere.’ The model supports a 32K context window and utilizes the LFM2 architecture to enable efficient agentic workflows and high-speed data extraction on edge devices, outputting at over 200 tokens-per-second on a Galaxy S25 Ultra. It is open-weights and available via Hugging Face under a dual-use commercial license.
OpenAI updated GPT-5.5 Instant to improve intent recognition and instruction following. According to OpenAI, the update is “more fun to talk to” with more natural responses, and it enhances the model’s ability to handle complex constraints, improving its shopping and local recommendations. This model update is the default model available in ChatGPT, while developers can access it via the `gpt-5.5` model.
Mistral AI released OCR 4, their 4^th generation document intelligence model that provides state-of-the-art document extraction, RAG pipelines, and knowledge search in 170 languages. OCR 4 generates structured document representations including bounding boxes, block-type classification, and per-word confidence scores. With pricing starting at $4 per 1,000 pages via Mistral API, Amazon SageMaker, and Microsoft Foundry, it suitable for low-cost, enterprise-oriented OCR.
Alibaba’s Qwen team released Qwen-AgentWorld, a language world model designed to predict how an environment changes from an agent’s actions across seven domains, including Search, Terminal, and Web. Predicting outcomes from an agent’s action can help train agents, and training agents within these controlled simulations produced performance gains exceeding real-environment training and improved scores on previously unseen benchmarks. A Qwen-AgentWorld Technical Paper was released to explain how the AgentWorld model was trained and used.
Unsloth created a 1-bit GLM 5.2 quantization to run a much smaller version of the frontier GLM 5.2 model locally on a 256GB Mac Studio. The 1-bit quantized build shrinks the footprint to roughly 200GB in GGUF format. A 2-bit quantization retains ~82% accuracy while shrinking model sized from 1.51TB to 238GB (-84% size).
Bytedance previewed the Seedance 2.5 video generation model at a conference in Beijing, highlighting major upgrades to the video generation platform. The upcoming model can natively generate a single segment of full audio-video up to 30 seconds long, double the length possible in previous versions. It supports up to 50 distinct text, image, audio, or video reference assets to provide users with finer control over motion and editing.
Krea AI announced Krea 2 as an open-weights text-to-image generation model for public download and customization. Krea 2 is a12B diffusion-transformer image model with two versions; Raw is the base checkpoint useful for further fine-tuning and Turbo is the post-trained version for direct use. Making Krea 2 open weight allows developers to run the model locally and fine-tune it on specific styles or custom datasets, in the same way the community has used Stable Diffusion and Flux models.
Baidu released Unlimited-OCR, an open-source 3B mixture-of-experts OCR model designed for long-horizon document parsing. This document model can parse 40 or more pages in a single forward pass by using a sliding attention window and constant-KV-cache to avoid memory and latency growth on long documents. This makes it practical self-hosted alternative for local OCR rather than sending documents to a large general-purpose API.
OpenAI made Codex Remote broadly available across ChatGPT plans, allowing users to initiate, continue and supervise coding work on connected Windows or Mac computers from the ChatGPT mobile application.
OpenAI expanded its Daybreak cybersecurity platform, introducing a full version of GPT-5.5-Cyber and new Codex Security capabilities designed to move from merely discovering vulnerabilities toward developing, testing and deploying patches.
OpenAI announced Codex Record and Replay, a workflow capture feature for desktop automation. Codex records clicks and actions, then generates an editable workflow file it can replay, which provides more adaptive software automation than relying on fixed pixel coordinates.
Google finds that reasoning can help language models retrieve facts they already know. Google researchers studied why chain-of-thought generation sometimes improves answers to straightforward factual questions that seemingly require no multi-step reasoning. Their analysis identified two mechanisms: reasoning tokens give the model additional opportunities for internal computation, while generating related information can prime the model to retrieve the correct fact, suggesting that reasoning improves memory access as well as formal problem-solving.
Source:
Google accelerated Gemini Nano with frozen multi-token prediction. Google Research described a technique that adds multi-token prediction to an existing on-device model without retraining the entire network. By freezing Gemini Nano’s main parameters and training lightweight prediction components to anticipate several tokens at once, the researchers increased generation speed on Pixel devices while avoiding the cost and complexity of rebuilding the underlying model.
OpenAI and Broadcom announced the Jalapeño inference processor, a custom “Intelligence Processor” chip co-designed by Broadcom and OpenAI for LLM inference. Designed in nine months with AI assistance, the Jalapeño chip is designed specifically to support the serving patterns of LLM inference to deliver significantly better performance per watt than current systems and reduce OpenAI’s cost of serving models. This is an inference chip rather than a training chip, so OpenAI will still rely on Nvidia GPUs for training models.
Reflection AI signed a multibillion-dollar SpaceX compute agreement, where Reflection AI will pay SpaceX $150 million per month for access to Nvidia GB300 systems at the Colossus 2 data center. The contract could be worth as much as $6.3 billion through 2029.
Google DeepMind is losing key researchers to Anthropic. Gemini researchers Jonas Adler and Alexander Pritzel left Google for Anthropic, following the recent exits by Noam Shazeer and DeepMind director John Jumper.
Meta halted worker tracking for AI training due to privacy fears. Meta had started tracking workers’ computer usage to create AI training data just two months ago, but it created an employee backlash and led to data being accessible outside of the company.
Patronus AI announced a $50 million Series B funding round. The startup utilizes simulated digital environments to stress-test the reliability of AI agents performing complex, multi-step tasks.
General Intuition raised $320 million at a $2.3 billion valuation. The startup is developing agentic models and world models by leveraging action-labeled gameplay data from Medal to train for spatial-temporal reasoning in simulation and robotics.
Adobe acquired Topaz Labs to integrate Topaz AI models for video and image enhancement into its Firefly AI app and various creative suites.
OpenAI called for shared institutions to govern advanced models in a policy paper, OpenAI argued that governments need stronger technical institutions capable of evaluating frontier models, protecting sensitive systems and developing common standards for increasingly powerful AI. The proposal emphasizes evaluation capacity, international cooperation and repeatable release procedures as governments become more involved in reviewing models with advanced scientific or cybersecurity capabilities.
The Atlantic launched a searchable public database called the AI Watchdog to track copyrighted material used to train AI music algorithms. The free tool allows musicians and content creators to search for specific artists or channels to see if their intellectual property was utilized by platforms like Suno or Udio. The database reveals that training datasets contain extensive quantities of content from prominent musical artists as well as hundreds of videos from popular independent YouTubers.
Thanks for reading AI Changes Everything! Subscribe for free to receive new posts and support my work.

No posts

source

AI Week in Review 26.06.27 – Substack

Leave a Reply Cancel Reply