NVIDIA said on June 23 that DFlash, an open-source lightweight block diffusion model used for speculative decoding, can boost LLM inference throughput by up to 15x on Blackwell GPUs while preserving user interactivity, according to its technical blog.
Continue reading
The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.
Already purchased? Sign in✓ Signed in — this article isn’t included in your current plan.