Intel Gaudi 2 Crushes Nvidia GPUs in Stable Diffusion: Up to 3x Faster!

Intel's Gaudi 2 accelerator has emerged as a game-changer in the AI hardware landscape, boasting remarkable performance gains over industry leaders NVIDIA H100 and A100.


In the ever-evolving landscape of artificial intelligence (AI) hardware, the competition between industry giants Intel and NVIDIA has reached new heights with the release of Stability AI’s benchmark showdown between Intel’s Gaudi 2 Accelerator and NVIDIA’s H100 and A100.

The results of these benchmarks offer a compelling narrative, showcasing Intel’s strides in providing high-performance AI solutions that not only rival but also offer superior value compared to NVIDIA’s offerings.

Stability AI, known for its open models designed to efficiently handle diverse tasks, conducted benchmarking tests using two of their models: Stable Diffusion 3 and Stable Beluga 2.5 70B. These tests aimed to compare the performance of Intel’s Gaudi 2 AI accelerator against NVIDIA’s H100 and A100 GPUs across various parameters.

Follow us on Linkedin for everything around Semiconductors & AI

Stable Diffusion 3 Benchmark Results: Intel vs Nvidia

Stable Diffusion is a cutting-edge artificial intelligence model that excels at generating realistic images from text descriptions. Released in 2022, it utilizes a technique called diffusion to achieve this remarkable feat.

Why it Matters: Stable Diffusion holds immense potential for various applications:

  • Creative Industries: It empowers artists, designers, and content creators to generate unique concepts and explore ideas visually.
  • Product Design: It can aid in prototyping and visualizing product ideas before physical creation.
  • Education and Research: It can help visualize complex scientific concepts or historical events.
  • Entertainment: It can be used for creating illustrations, concept art for games and movies, or even personalized avatars.

In the benchmarking run for Stable Diffusion 3, an acclaimed text-to-image model, Intel’s Gaudi 2 exhibited exceptional performance. Testing the model with parameters ranging from 800M to 8B, Intel’s accelerator delivered impressive results, particularly with a 2B parameter version. The comparison involved utilizing two nodes featuring a total of 16 Intel and NVIDIA accelerators, with varying batch sizes.


The results revealed that Intel Gaudi 2 offered a significant 56% speedup compared to NVIDIA’s H100 80GB GPU and an impressive 2.43x speedup against the A100 80 GB GPU. Moreover, Intel’s Gaudi 2 leveraged its 96 GB HBM capacity to further enhance performance, achieving a speedup of 35% over the 16 Batch Gaudi 2 accelerator, 2.10x over the H100 80GB, and an outstanding 3.26x over the A100 80 GB AI GPUs.

Scaling up to 32 nodes (256 accelerators) for both Gaudi 2 and A100 80 GB GPUs, Intel’s solution demonstrated a remarkable 3.16x increase in performance, outputting 49.4 images/second/device compared to just 15.6 on the A100 solution.

While Intel Gaudi 2 excelled in training performance, NVIDIA retained superiority in inferencing, thanks to its TensorRT optimizations. Despite this, Intel’s Gaudi 2 chips showcased comparable inference speeds to NVIDIA’s A100 chips using base PyTorch. With further optimization, Gaudi 2 is poised to outperform A100s on specific models, indicating Intel’s ongoing commitment to enhancing AI acceleration capabilities.

Read More:How AMD MI300X is A Game-Changer in the AI Hardware Landscape – techovedas

Stable Beluga 2.5 70B Benchmark Results: Nvidia Vs Intel

In the benchmarking of Stable Beluga 2.5 70B, a fine-tuned version of LLaMA 2 70B, Intel Gaudi 2 demonstrated impressive performance even without additional optimizations. Running under PyTorch, the 256 Gaudi 2 AI accelerators achieved an average throughput of 116,777 tokens/second, surpassing the A100 80GB solution by approximately 28%.

Read More:Nvidia’s H100 AI GPUs Projected to Surpass Energy Consumption of Georgia and Costa Rica – techovedas

Implications and Future Prospects:

These benchmark results underscore the intensifying competition within the AI hardware market. While NVIDIA has long dominated with its CUDA/Tensor architecture and robust software optimizations, Intel’s Gaudi 2 emerges as a formidable contender, offering not only competitive performance but also compelling value propositions.

Moreover, Intel’s roadmap includes the upcoming release of the Gaudi 3 AI accelerator, promising further advancements in deep learning and large-scale generative AI models. With continued innovation and optimization efforts, Intel aims to solidify its position as a leading provider of AI solutions, offering customers a diverse array of options beyond NVIDIA’s offerings.

Read More: Intel CPU Dominance in Q4 2023: Shipping 50 million Units, over 3 Times AMD & Apple Combined – techovedas


In conclusion, the AI landscape is witnessing a paradigm shift where hardware prowess is complemented by software optimizations tailored to specific accelerators. Intel’s Gaudi 2 and NVIDIA’s H100 and A100 GPU accelerators represents a significant milestone in this journey, signaling a future where customers can choose from a variety of robust AI solutions, ultimately driving innovation and competitiveness in the industry.

Editorial Team
Editorial Team
Articles: 1790