TOPICS
ai and ml
Language model builds on diffusion tech to boost output performance by up to 4x, claims Chocolate Factory
The boffins on Google’s DeepMind team unveiled an experimental new language model this week that uses techniques originally developed for AI image generators to boost text output performance by as much as 4x when running on resource-constrained consumer hardware. It’s free to download and you can run it with just 18 GB of DRAM or VRAM.
The model, codenamed DiffusionGemma, is the latest addition to Google’s open weights model family. But unlike Gemma 4, which launched this spring, the 26 billion-parameter mixture of experts (MoE) model isn’t a large language model in a conventional sense.
Instead, it’s actually closer to image models like Stable Diffusion or Flux. Rather than generating tokens one after another in an autoregressive fashion, DiffusionGemma generates entire paragraphs’ worth of tokens at the same time.
The process looks a lot like how a diffusion model turns what’s essentially static into an image through a series of denoising steps.
As Google explains it, DiffusionGemma works by laying out a canvas of random tokens, and then refining them until the final output is reached.
Compared to conventional LLMs, which are memory-bandwidth bound and require a lot of VRAM, diffusion models are a predominantly compute-bound workload, which is why the Chocolate Factory is positioning these models for local deployment.
LLMs are autoregressive. During token generation, the model’s active parameters need to be streamed from memory for every token generated, making memory bandwidth a major bottleneck.
In the cloud, inference providers balance compute and memory bandwidth by processing hundreds or thousands of requests in parallel. As you might have guessed, this isn’t something the average user running a local model on their notebook can do.
However, many consumer products, like high-end graphics cards, have plenty of excess horsepower, which DiffusionGemma can take advantage of to boost output performance.
Diffusion language models aren’t perfect. Google isn’t the first to explore this tech. Previous models, like DREAM or Mercury 2, demonstrated major speedups over conventional LLMs, but generally underperformed them in benchmarks for their size.
DiffusionGemma doesn’t appear to be any different. According to Google, the 26 billion-parameter model falls just behind Gemma 4 12B in the GPQA-Diamond benchmark, with its main advantage being output speed, and even then it’s not as impressive as Google has made it out to be.
The chart shows a roughly 2.25x speedup for DiffusionGemma over the 12B parameter LLM with speculative decode enabled. Compared to Gemma 4 26B-A4B, the speedup is nearly 4x when running a single Nvidia H100.
DiffusionGemma is being released as an experimental model rather than an enterprise focused one, like we saw with Gemma 4.
The model is available for download on popular model repos like Hugging Face under a highly permissive Apache 2.0 license with support already merged into popular inference engines like vLLM, MLX, and HF Transformers, with support for Llama.cpp coming soon.
While local inference has largely been the domain of AI enthusiasts, companies like Google are increasingly leaning on the tech to cut cloud costs associated with their AI services. As you may recall, back in May, Google quietly began shipping a small LLM with its Chrome web browser. ®
IDC says recent moves show Anthropic racing to meet enterprise requirements
‘Enterprises are fed up,’ says Alex Karp, because LLM makers ‘want to tokenmax’ instead of understanding enterprise needs
PARTNER CONTENT: Recognized for breakthrough achievements in FWA, Network Ecosystem, and Native AI Baseband, ZTE solidifies its role as a key driver of Indonesia’s 5G-Advanced and AI economic growth
Join Claude Corps, see the world, spread the gospel of AI
AWS better at running chip fabs than their mouths
University of Nottingham is first of many, Shiny tells The Reg
Learn how a consumption-based operating model provides flexibility, improves efficiency, and brings predictability to infrastructure investments.
Modern applications are API-driven, interconnected, and often over-permissioned, making them an ideal target for AI-assisted attacks.
Join us to discover how to eliminate infrastructure silos and establish a standardized, enterprise-grade cloud-native platform.
Microsoft 365 is the backbone of enterprise communication, and its native security filters out the known and the noisy.
This is your technical deep-dive into the practical tools and techniques that define the next generation of resilient Dev and IT operations.
Step into the chaos of a live ransomware breach, test your response skills, and team up with other IT and security pros to outsmart cybercriminals
Ransomware attacks aren’t slowing down, and neither are we. Druva’s hit event, Escape Ransomware, is now fully virtual.
The identity and access models most organizations rely on were built for human users, not non-human identities operating independently.
The identity and access models most organizations rely on were built for human users, not non-human identities operating independently.
Join us to learn how to unlock real ROI by driving adoption of AI at scale.
ai and ml
IDC says recent moves show Anthropic racing to meet enterprise requirements
ai and ml
‘Enterprises are fed up,’ says Alex Karp, because LLM makers ‘want to tokenmax’ instead of understanding enterprise needs
AI AND ML
Join Claude Corps, see the world, spread the gospel of AI
ai and ml
Language model builds on diffusion tech to boost output performance by up to 4x, claims Chocolate Factory
offbeat
We’re all familiar with AI cranks by now, but what about crank-powered AIs?
Security
PLUS: US takes down Iranian propaganda sites; Marketing company asks ‘Why Do We Have Your Information?’ And more!
Security
PLUS: China upgrades smartphone surveillance tools; Ring eases anti-snooping stance; and more
Black Hat and DEF CON
Voting village reports have been so successful, says Jeff Moss, that the whole of DEF CON will now be included
Security
Went at equivalent of $3.5B+ valuation for entire firm, though portion sold not specified
Malware Month
On the plus side, infosec’s a good bet for a long, stable career
When a community came together after Red Hat said Windows was ‘probably the right product’
Project Headroom could save you big money, too
Sixtieth release adds more cores, delayed hibernation, and basic Wi-Fi 6 without losing its ascetic streak
Red Hat’s free distro loses a desktop, but makes an important new friend
Like AirDrop, minus the Apple lock-in
Blog post mourning decline appears to have helped knock what was left of the veteran app’s online presence offline
Biting the hand that feeds IT
Contact us
Advertise with us
Who we are
Newsletter
The Next Platform
DevClass
Blocks and Files
Situation Publishing
Cookies Policy
Privacy Policy
Ts & Cs
Do not share my personal information
Your Consent Options
Copyright. All rights reserved © 1998-2026.

Leave a Reply