AI Week in Review 26.06.13 – Substack

Home AI AI Week in Review 26.06.13 – Substack
AI Week in Review 26.06.13 – Substack

Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability … The longer and more complex the task, the larger Fable 5’s lead over our other models.
Anthropic launched Claude Fable 5, its first public Mythos-class model, alongside the highly restricted Claude Mythos 5 model. Fable 5 is easily the most intelligent AI model yet released, with SOTA benchmarks on knowledge work (1932 on GDPval-AA), agentic coding (80.3% on SWE-Bench Pro, 88% on TerminalBench 2.1), reasoning (59% on Humanity’s Last Exam), and top positions across several capability leaderboards, including Agent Arena.
These models are excellent for long-horizon agentic work, use cases like Riley Brown re-implementing the whole Lovable interface in 2 prompts bear this out.
Positioned as a premier autonomous agentic system made safe for general use, Anthropic introduced significant safeguards on Fable 5, by redirecting high-risk cyber, biology, chemistry, and model-distillation requests away from the frontier model to Opus 4.8.They shared details in their Claude Fable 5 and Claude Mythos 5 System Card.
But Anthropic went further in the original Fable 5, silently sabotaging Fable 5 on requests that might relate to competitive AI development. This hidden output degradation led to significant backlash and criticism about trust and evaluation integrity. Anthropic then reversed course and changed the behavior so flagged requests visibly fall back to Opus 4.8 with explicit reasons for API users.
But for now, Fable 5 is gone. Anthropic abruptly suspended access to Claude Fable 5 and Claude Mythos 5 after it received a U.S. government export-control directive barring access by foreign nationals. Because the order applied broadly to foreigners, including foreign-national Anthropic employees, the company said it had to abruptly disable the models for all customers.
In his recent “Policy on the AI Exponential” blog post, Dario Amodei advocated for an Advanced AI Framework for overseeing models that would allow Government to block AI model releases. However, Anthropic insists that the Government’s ban on Fable 5 is based on a misunderstanding of its model’s risks due to a report of a jailbreak. They say, “We believe this is a misunderstanding and are working to restore access as soon as possible.”
Frontier AI models, like airplanes, should be required to go through technical testing and auditing, and their release should be blocked or reversed as a threat to public safety if they do not meet high standards of safety – Dario Amodei
Apple used WWDC26 to unveil their next generation of Apple Intelligence and a rebuilt Siri AI.
Apple announced its third generation of Apple Foundation Models, a family of five custom models developed in collaboration with Google. The AFM 3 models include on-device models and cloud models:
AFM 3 Core, a 3B dense model for on-device use.
AFM 3 Core Advanced, a multimodal 20B parameter sparse MoE for multimodal device tasks.
AFM 3 Cloud, a server-side workhorse model for speed and performance.
AFM 3 Cloud Image, for image generation and editing in photo-editing and Image Playground.
AFM 3 Cloud Pro, for demanding agentic and reasoning use cases.
The AFM 3 framework is designed to power contextual, multi-platform AI experiences across the Apple ecosystem, leveraging both on-device hardware and Apple’s secure private cloud servers.
To support custom AI, Apple is also introducing Core AI, a new framework for running custom AI models on Apple silicon and Apple devices.
Apple presented the new Siri AI as far more capable at using personal context, app actions, and on-screen information, and more deeply integrated across Apple’s products – iPhone, iPad, Mac, Apple Watch, and Vision Pro. The new Siri AI adds web-based world knowledge and Visual Intelligence, and it is built around App Intents and App Schemas so apps can expose content and actions in natural language. This helps the new Siri AI handle multi-step requests like finding specific photos, organizing emails, and taking actions across apps.
Apple updated Image Playground with their newest AI model, AFM 3 Cloud Image, so it is capable of producing improved photorealistic and stylized graphics, more in line with competitive image generation tools.
Google launched DiffusionGemma, an experimental 26B Mixture of Experts (MoE) model with 3.8B active parameters that uses text diffusion and is released under an open-source Apache 2.0 license. DiffusionGemma utilizes text diffusion to generate 256-token text blocks simultaneously, which yields up to six times faster local generation speeds (over 1,000 tokens per second on a single H100). The speed makes it ideal for interactive workflows like in-line editing of code and documents, but it has lower overall output quality compared to the 26B Gemma 4 model.
Google introduced Gemini 3.5 Live Translate, a speech-to-speech model for near real-time voice translation in more than 70 languages. Gemini 3.5 Live Translate features a single continuously streaming audio model rather than a stitched pipeline of speech recognition, translation, and text-to-speech components. This model preserves tone, pacing, and expressiveness while translating with sub-second latency. This update is rolling out to Google Translate and Google Meet for live translation, with developer integrations available via the Gemini Live API.
Moonshot AI released Kimi K2.7-Code, an open-source update on the Kimi K2 1T parameter MoE architecture that claims a 30% reduction in thinking-token usage compared to its K2.6 predecessor. K2.7-Code is an open weights model available on HuggingFace and available via the Kimi Code platform. While Moonshot AI reports performance gains on internal benchmarks, independent evaluations on KernelBench-Hard showed regressions in specific GPU kernel optimization tasks. Elliot Arledge’s benchmarking assessment isK2.7 is more honest but not more capable” than its K2.6 predecessor on Cuda kernel coding.
Cohere released North Mini Code, an open-source 30B agentic coding model aimed at developers who want AI coding agents that can be run and improved outside closed proprietary systems. The model is the company’s first model for developers and is available on HuggingFace under the Apache 2.0 license.
Google introduced new Gemini features tailored for small businesses, including a direct Google Business Profile connection and proactive Business notebooks. The update allows Gemini to integrate with Google Business Profiles to access customer reviews, questions, and performance data. New Business notebooks provide a centralized space to organize workflows and generate content based on specific business context.
Cognition launched FrontierCode, a tougher coding benchmark designed to measure whether an AI-generated pull request is production-quality. Built from 150 original tasks, the benchmark emphasizes evaluation criteria such as scope control, regression safety, and test quality. Fable 5 posted the highest score (46.3%) versus Opus 4.8 (34.3%) and GPT-5.5 (25.5%).
Xiaomi’s MiMo AI team has open-sourced MiMo Code V0.1.0, a terminal-native AI coding harness that Xiaomi claims can outperform Claude Code on long tasks. The assistant utilizes a cross-session memory architecture with a dedicated checkpoint-writer subagent to maintain context during long-horizon, multi-step tasks.
Avataar AI from India launched a new video model called Varya that uses distillation from Alibaba’s Wan 2.2 and generates video ten times faster than the original at a cost of under a penny per second. Varya features Indian cultural nuances By tuning the model with curated data for the India market. It can be accessed at the Varya platform and will be released as an open-weight model on the India’s AIKosh portal.
Deezer introduced a tool to identify AI-generated tracks in streaming playlists. The free online detector supports 27 languages and scans music from 20 platforms, including Spotify, Apple Music, and YouTube Music. Deezer reports that 44% of all new music uploaded to its platform is AI-generated.
Google recently published “Accelerating scientific discovery with Co-Scientist” in Nature, an account of how the Co-Scientist AI system is designed to help solve complex problems in the life sciences. The tool uses specialized agents to generate, debate, and refine new hypotheses through three distinct phases of idea generation, peer review, and refinement. One case of Co-Scientist was helping to identify new drug repurpose candidates and synergistic combination therapies for acute myeloid leukemia.
UC Berkeley researchers launched Agents’ Last Exam (ALE) Benchmark, which evaluates AI agents on long-horizon professional workflows. OpenAI’s GPT-5.5 leads the ALE Leaderboard with a 24.0%, beating Anthropic’s Claude Fable 5.
Artificial Analysis developed AA-AgentPerf, a new hardware performance benchmark that measures how many concurrent agentic AI agents a system can sustain. Nvidia’s GB300 sets a new standard for agentic AI workload performance, over 20 times better than the H200.
SpaceX launched their IPO into the stock market stratosphere, rising on the IPO debut to a valuation over $2 trillion by the market close. SpaceX is rising in part due to XAI, its Colossus AI data center it is renting to Anthropic, and their claims of pursuing a $4 trillion AI opportunity.
SpaceX is combining their AI and space opportunity with a proposed satellite designed to host AI supercomputers in orbit. The engineering specs describe about 150 kW peak power per satellite, and Musk claims the challenge it not harder than some other things they are doing. I doubt this idea will come to fruition soon; it seems it has been hyped up lately for IPO buzz.
OpenAI announced that it had confidentially submitted a draft S-1 to the SEC, giving the company the option to go public while emphasizing that timing has not been decided. The announcement said OpenAI expects the filing to leak and still sees tradeoffs between remaining private and preparing for a public offering.
OpenAI announced that it will acquire Ona, a company focused on secure cloud execution and orchestration. Ona’s technology will help Codex expand from a session-bound developer tool into a persistent agent environment that supports long-running software and knowledge-work tasks.
OpenAI and Oracle announced that OCI customers will be able to access OpenAI models and Codex using existing Oracle cloud commitments.
OpenAI announced support for the EU Code of Practice on Transparency of AI-generated content, tying the move to its provenance and content-authenticity work. This guides AI providers in standards and tools to help users distinguish synthetic media from human-created content.
Apple said that Siri AI will be delayed on iOS 27 and iPadOS 27 in the European Union because of the Digital Markets Act. Apple said EU users will still be able to access Siri AI on macOS 27 and visionOS 27, but that iPhone, iPad, and watchOS access will not arrive on the same timeline because of unresolved regulatory concerns.
OpenAI published a threat report claiming PRC-linked influence operations are targeting AI debates in the United States, including around data center buildout. The OpenAI threat report frames the incidents as Chinese-based covert influence operations aimed at shaping political and public opinion. The report relates this to a spike in anti-datacenter social media activity and says OpenAI identified accounts using ChatGPT as part of those campaigns.
Anthropic introduced Claude Corps, a national fellowship program for early-career people interested in using AI for public benefit and community impact.
Reuters reports on the broad public anxiety about AI among the US public, citing a Reuters/Ipsos poll showing high levels of concern about AI use and job displacement. With AI usage skyrocketing and OpenAI and Anthropic moving toward public listings, the investor appetite for AI companies is colliding with greater public unease about AI’s economic effects.
If you have AI fears or AI FOMO, the best move is to learn AI. OpenAI introduced three new OpenAI Academy courses: AI Foundations, Applied AI Foundations, and Agents and Workflows. The courses are available to ChatGPT users and will help individuals and organizations move to using AI in repeatable AI workflows and agent-assisted work.
Thanks for reading AI Changes Everything! Subscribe for free to receive new posts and support my work.

No posts

source

Leave a Reply

Your email address will not be published.