Google DeepMind's open-weight model "Gemma-4-26B-A4B" ran 16 instances in parallel on a single NVIDIA DGX Spark, hitting about 18 tok/s per instance and roughly 300 tok/s in aggregate. In a demo video released by the official Gemma team on June 23, 2026, the model was shown capable of scaling up to 32 parallel runs on the same hardware, underscoring the inference efficiency of its architecture.
Continue reading
The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.
Already purchased? Sign in✓ Signed in — this article isn’t included in your current plan.