ainewsblitz.com

Breaking

Gemma 4 26B-A4B Runs 16 Instances on a Single DGX Spark at 300 tokens/s

  • Foundation Models
  • Infra & Chips
  • Open Source

Google DeepMind's open-weight model "Gemma-4-26B-A4B" ran 16 instances in parallel on a single NVIDIA DGX Spark, hitting about 18 tok/s per instance and roughly 300 tok/s in aggregate. In a demo video released by the official Gemma team on June 23, 2026, the model was shown capable of scaling up to 32 parallel runs on the same hardware, underscoring the inference efficiency of its architecture.

Continue reading

The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.

$20
Read this article
$29/month
Unlimited — all 3,685 articles, the full archive, and comprehension quizzes
Save 72%
$98/year
≈ $8.17/month
Unlimited, billed once a year