Introduction
NVIDIA GPUs architectures have witnessed a remarkable evolution, shaping the landscape of parallel processing and accelerating advancements in various computing domains. From the foundational Tesla architecture in 2008, introducing CUDA cores, to the groundbreaking innovations of Volta, Turing, Ampere, and Hopper, each iteration has pushed the boundaries of performance and functionality.
In this blog we reflect on the journey that has transformed GPUs from graphics processors to powerful accelerators driving breakthroughs in artificial intelligence, high-performance computing, and more.
Follow us on LinkedIn for everything around Semiconductors & AI
Why Nvidia ventured into datacentre GPUs?
Before venturing into datacentre GPUs, NVIDIA was primarily focused on rendering ultra-realistic video game graphics.
Their seminal step towards AI and deep learning technology began in 2006 with the release of the Compute Unified Device Architecture (CUDA) platform.
This major innovation in GPU-based parallel computing technology set the stage for NVIDIA’s foray into manufacturing datacentre GPUs.
GPUs can handle lots of data very quickly, making them an integral part of datacentres.
NVIDIA GPUs have become the gold standard for accelerating workloads like analytics, artificial intelligence, and scientific computing.
Now, let’s delve into the chronological journey of NVIDIA’s datacentre GPUs.
Read More: What is Artificial General Intelligence (AGI)and Why Should You Care? – techovedas
Tesla (2008)
The Tesla architecture laid the foundation for subsequent NVIDIA GPU architectures. Unveiled in 2008, it marked a significant shift towards parallel processing on GPUs.
This architecture emphasized the use of parallelism to accelerate diverse computational workloads, ranging from scientific simulations to data processing.
Fermi (2010)
NVIDIA’s Fermi architecture, introduced in 2010, featured a redesigned CUDA core, enhanced memory subsystem, and introduced ECC support.
ECC, or Error-Correcting Code, is a technology designed to detect and correct errors in computer memory. It ensures the accuracy of data stored in memory by adding extra bits to each piece of information.
Fermi also introduced advanced Double Precision (FP64) performance which means the feature utilizes 64 bits to represent numbers for increased precision. This is essential for complex simulations and scientific calculations.
Read More: What is Artificial General Intelligence (AGI) : Possibilities & Danger – techovedas
Kepler (2012)
On March 21, 2012, NVIDIA launched Kepler, its first GPU based on the next-generation architecture. It utilized a 28-nanometer (nm) process technology, succeeding the 40-nm NVIDIA Fermi architecture. Notable GPUs in this series included the Tesla K10 and K20.
Tesla K80 is really two GPUs in one. This Tesla K80 block diagram illustrates how each GK210 GPU has its own dedicated memory and how they communicate at x16 speeds with the PCIe bus using a PCIe switch Key Specifications: Enhanced CUDA cores, improved energy efficiency.
Maxwell (2014)
NVIDIA introduced Maxwell in February 2014 as the successor to Kepler. Maxwell, built on an array of new technologies, underwent manufacturing using TSMC’s 28 nm process. It introduced an improved Streaming Multiprocessor (SM) design that increased power efficiency.
The Streaming Multiprocessor (SM) is a key component in NVIDIA’s CUDA. Each GPU consists of several SMs, which are general-purpose processors that execute several thread blocks in parallel. These SMs are responsible for the execution of instructions, processing data.
Pascal (2016)
In April 2016, NVIDIA introduced Pascal, named after the French mathematician and physicist Blaise Pascal. The architecture found primary application in the GeForce 10 series, with manufacturing initially utilizing TSMC’s 16 nm FinFET process and later transitioning to Samsung’s 14 nm FinFET process.Key Specifications: HBM2, increased CUDA cores, improved NVLink connectivity.
Volta (2017)
NVIDIA introduced Tensor Cores with the Volta architecture. Specialized execution units known as Tensor Cores perform the core compute function in Deep Learning by handling tensor/matrix operations. Meticulously designed, they enhance the overall performance of systems dedicated to deep neural network workloads by efficiently executing these operations.
The Tesla V100 GPU, based on the Volta architecture, was one of the first GPUs to incorporate Tensor Cores.
NVIDIA announced Volta on May 14, 2017, positioning it as the successor to both the Pascal and Maxwell architectures. The design aimed to democratize AI across industries, propelling breakthroughs in diverse fields.
Turing (2018)
Turing, announced at SIGGRAPH (annual conference that focuses on computer graphics) in August 2018, was the first GPU capable of real-time ray tracing. This allows GPU to render visually realistic 3D games and complex professional models with physically accurate shadows, reflections, and refractions.
Turing GPUs utilize GDDR6 memory, which offers higher bandwidth and improved power efficiency compared to previous GDDR5X memory used in Pascal GPUs. The memory interface circuits in Turing GPUs were redesigned to achieve data rates of up to 14 Gbps. Apart from this, the individual SMs of Turing offered a 50% performance improvement over the Pascal SM design.
Read More:6 Emerging Trends in Semiconductor Technologies in 2024 – techovedas
Ampere (2020)
Since the release Ampere has been The Core of AI and HPC in the Modern Data Centre.
c Ampere introduced six key innovations, including third-generation Tensor Cores and Multi-Instance GPU (MIG).
A Multi-Instance GPU (MIG) is a GPU partitioned into multiple smaller GPUs. Each partition, or instance, operates independently with its own memory, cache, and compute cores.
Ampere’s market penetration has been substantial, contributing to NVIDIA’s record revenue of $8.29 billion in the first quarter of fiscal 2023. This represents a 46% increase from the previous year, highlighting Ampere’s significant contribution to NVIDIA’s profits
Hopper (2022)
Hopper, named after computer scientist Grace Hopper, was officially revealed in March 2022. Built with over 80 billion transistors using a cutting-edge TSMC 4N process, Hopper introduced five groundbreaking innovations, including the Transformer Engine designed to accelerate AI model training. It also tripled the floating-point operations per second (FLOPS) for various precisions over the prior generation.
A Transformer model is a type of artificial intelligence model that helps computers understand and generate human-like text. Transformer Engine feature in NVIDIA’s Hopper architecture acts as a dedicated core that acts as a translator helping the computer understand and perform generative AI tasks more quickly and efficiently.
Blackwell (2024)
Blackwell, the rumoured successor to Hopper, is expected to be launched in late 2024. Reports suggest that Blackwell might utilize a 3nm process, leading to higher transistor density and boosting overall performance.
Conclusion
NVIDIA’s GPU architectures have evolved significantly, from the groundbreaking Tesla in 2008 to the upcoming Blackwell. Some of these iterations introduced groundbreaking innovations while others were improvements in previous generations.
If we sum up, the tesla (introduced CUDA), Volta (introduced tensor cores), Turning (ray tracing) and Ampere (Multi instance GPUs) were the key releases in the evolution of NVIDIA’s GPU line up. Moving on to 2024 we expect Blackwell to bring multi-chiplet design apart from the obvious up scaling in past features.