The Center for Data Innovation spoke with Steve Oberlin, chief technology officer of accelerated computing at Nvidia. Oberlin discussed the impact of supercomputing on astronomy and the role supercomputers can play in national competitiveness.
This interview has been edited.
Joshua New: Why have GPUs, rather than CPUs, seemingly become the standard hardware for AI applications?
Steve Oberlin: It comes down to fundamental architecture efficiency on parallel work. AI applications lean really heavily on performing lots and lots of floating point operations on matrices of data. There’s an incredible amount of parallelism in the work that needs to be done and similar operations are performed on enormous training datasets many times over. Training a neural network take many iterations of going through really large datasets with a big computational load for it to be able to do things like recognizing objects in images, translate speech, or drive a car.
Conventional CPUs come from sequential processing roots. Early CPUs literally did just one operation at a time, and over time we’ve made a lot of optimizations to speed up these operations, such as improving clock speed and adding data caches. But you can only speed up sequential processing so much before you run into fundamental physical limits, and after that, you’d have to go parallel if you want to make things go faster. To really exploit parallelism you need a different architecture that’s optimized for parallel processing. GPUs have very different roots from CPUs—the “G” stands for graphics, and their job has traditionally been very parallel. Computer games or 3D design software might render millions of polygons every frame, performing thousands of operations for each polygon to determine what their color should be, if they cast a shadow, if they are reflective, and so on. So compared to CPUs, GPUs have always been designed to be massively parallel.
For training enormous artificial neural networks which might have thousands of layers, millions of parameters, and need massive training datasets, GPUs turned out to be dozens and even hundreds of times more efficient than CPUs. This is the difference between training an AI in a day rather than in a month, or an hour instead of a day, which makes a huge difference for researchers.
New: Nvidia is one of the leading companies involved in high-performance computing, also known as supercomputing, and is working to help develop exascale computing. Can you describe what what this is, and why this benchmark is so important?
Oberlin: Many scientific applications have the same kind of workload characteristics as what I described for AI, which is why GPU accelerated computing is now so important. Exascale is shorthand for a supercomputer that can perform an exaflop, which is 1018 double precision floating point operations per second. People are just not evolved to be able to comprehend a number that large. For decades, and I’m not really sure why, scientists have used three orders of magnitude as major milestones in supercomputing performance. In the 1980s, the challenge was to perform a gigaflop (109), then in the 90s it was a teraflop (1012), and in the 2000s it was a petaflop (1015). I have a picture that I use to bring this into perspective, which compares crawling to jogging, to driving on the freeway, to flying in a jet at 600 miles per hour. Each of those steps is one order of magnitude, so the difference between crawling and flying in a jet is three orders of magnitude, or the difference between a gigaflop and a teraflop. The fastest computer in the world is the Summit supercomputer, which uses Nvidia GPUs, at Oak Ridge National Laboratory. It’s the first supercomputer to break 100 petaflops, and for AI work its actually capable of over 3 exaflops.
So why do scientists need so much horsepower? It comes down to understanding how dependent human progress is on technology and science for advancement. We’re facing enormous challenges that so many scientific fields are trying to solve. Science used to have three different pursuits: theory, experimentation, and discovery. Supercomputers give us a fourth: simulation. This lets you study things that simply can’t be studied in the real world and helps you discover new drugs or study the universe. Building better supercomputers is like building a bigger telescope. A lot of how we will navigate the future dependent on the continued advancement of high performance computing and the science that relies on it.
New: How is it that the peak performance of the world’s leading supercomputers can increase so substantially after just several years? Is this just Moore’s law at work, or are there other factors contributing to these advancements?
Oberlin: Moore’s law has played a major part for the past few decades, but unfortunately it has been slowing down since the early 2000s. That has affected CPU performance increases, but since GPUs have inherently parallel architecture, their advancement is dependent on different factors. Every new generation of technology doesn’t terrifically increase clock speed, but does multiply the number of computational units. GPUs are physically the largest processors that can be built—our latest processor, called Volta, has over 20 billion transistors and is what is called a max reticle chip, which is the largest you could process. We can just hook up multiple GPUs together with a high-speed channel to boost performance. Our recent server uses 16 parallel GPUs, while Summit uses over 27,000. Because GPUs are so computationally efficient and energy efficient, you can use them on really large scales.
New: Can you describe how high-performance computing is influencing the field of astronomy? What has it enabled scientists to discover that would otherwise be impossible on less advanced systems?
Oberlin: People are using supercomputers in really every area of astronomy. For example, people are using them to study the very first moments of the Big Bang to understand how matter coalesced from energy, to study the movement of galaxies, to analyze the distribution of dark matter, and other fundamental mysteries of the universe. These are things that could only be done with a supercomputer. I know researchers studying how stars work and gravitational lensing, which require really complex simulation. People are also using supercomputers for AI to analyze massive astronomy datasets.
Most recently, we’ve been able to finally detect gravitational waves thanks to supercomputing in multiple ways. Gravitational detectors are very long laser interferometers that are essentially the most sensitive motion detectors that have ever been built. Because they’re so sensitive and because gravitational waves are such a small signal, almost everything else—every vibration, from earthquakes to vehicles—adds noise that needs to be filtered out using extremely sophisticated algorithms that are extremely computationally intense.
For a long time, studying space has been a lot like listening to the universe. But now with supercomputing, we can interpret the data we record so quickly that we can actually swing telescopes around to observe some of these things happening as soon as we hear them. This is a whole new paradigm for astronomy.
New: Countries such as the United States and China are actively competing to have the world’s most powerful supercomputer. What makes investing in high-performance computing a national priority for so many countries?
Oberlin: As I mentioned, supercomputers and this critically important fourth way of doing science have really become the foundation for our economy now. If your scientists don’t have access to powerful supercomputers, they’re disadvantaged compared to their peers. Our economies thrive on technological advancement, which enables productivity advancement. All those technologies are built on science, and if the science doesn’t move forward, the technology doesn’t either. It’s easy to prove that investment in science drives national economies.
Supercomputers have also been very important for national defense because they can lead to improved defense technologies. But more importantly they have also been critical to international peace. Stopping nuclear testing in the atmosphere and underground was only made possible because supercomputers became powerful enough that we could simulate everything we needed to know about nuclear weapons and ensure the safety and operational readiness of nuclear stockpiles. Whatever your political views, this is universally a good thing.
Faster supercomputers are telescopes into the future for understanding things like what climate change is doing to our planet. Modeling the entire planet weather and climate to high enough resolution, and then being able to see what that would look like in 100 years, is the only way we have to figure out what the future of the environment looks like and gauge the impact of different interventions. This is true in many different fields. In medicine for example, precision medicine and sequencing the human genome is completely enabled by high performance computing. If we ever hope to be able to understand and cure cancer, it’s going to be because of knowledge directly enabled by high performance computing.
Now that we see the incredible importance of AI for every aspect of society, the only way to understand its impact and drive this technology forward is with supercomputers. The Chinese get this. Most people in the U.S. government get this. I think that we would probably benefit from investing a lot more than we are, giving the fundamental dependence of our national economy on high performance computing. This is not an area for us to neglect—there are a lot of difficult questions and incredibly complex relationships between forces, and I can’t overemphasize the importance of supercomputing for understanding these.