Computers consist of a processing component and a memory component. In the most basic sense, processors perform computations and memory stores data.
For simple computations, a single processor may do the job. For more complex operations, however, multiple processors are often the only way to solve a problem. Many applications in the public and private sector require massive computational resources, such as real-time weather forecasting, aerospace and biomedical engineering, nuclear fusion research and nuclear stockpile management. Since these applications exceed the capacity of a single server, computer engineers have devised high-performance computing platforms that can deliver substantially more processing power. The most powerful computer systems in use today leverage thousands of linked processors to perform computations quickly by sharing the workload among multiple processors.
There are two general models for managing and coordinating large numbers of processors. One is typified by supercomputers. These are large, expensive systems—usually housed in a single room—in which multiple processors are connected by a fast local network. The other is distributed computing. These are systems in which processors are not necessarily located in close proximity to one another—and can even be housed on different continents—but which are connected via the Internet or other networks.
Advantages and Disadvantages of Each Model
The advantage of supercomputers is that since data can move between processors rapidly, all of the processors can work together on the same tasks. Supercomputers are suited for highly-complex, real-time applications and simulations. However, supercomputers are very expensive to build and maintain, as they consist of a large array of top-of-the-line processors, fast memory, custom hardware, and expensive cooling systems. They also do not scale well, since their complexity makes it difficult to easily add more processors to such a precisely designed and finely tuned system.
By contrast, the advantage of distributed systems is that relative to supercomputers they are much less expensive. Many distributed systems make use of cheap, off-the-shelf computers for processors and memory, which only require minimal cooling costs. In addition, they are simpler to scale, as adding an additional processor to the system often consists of little more than connecting it to the network. However, unlike supercomputers, which send data short distances via sophisticated and highly optimized connections, distributed systems must move data from processor to processor over slower networks making them unsuitable for many real-time applications.
Weather forecasting is a prototypical supercomputing problem, in part because of how much data it takes to produce a weather forecast that is accurate by contemporary standards. Weather simulations take in massive quantities of data on temperature, wind, humidity, pressure, solar radiation, terrain, and numerous other environmental factors, and must account for global as well as local changes in these variables. Processing this data on a distributed system would mean repeatedly transferring data over relatively slow networks thereby seriously limit forecasting speeds. Since changes in weather occur continuously, having to wait for data to move around the system makes for forecasts that are already out of date as soon as they are produced. Other examples of supercomputing applications include nuclear stockpile management and large-scale physics simulations such as those involved in aerospace engineering.
In contrast, distributed systems are most useful for problems that are not as sensitive to latency. For example, when NASA’s Jet Propulsion Laboratory (JPL) needed to process high volumes of image data collected by its Mars rovers, a computer cluster hosted on the Amazon Cloud was a natural fit. Such tasks are not substantially hindered by small delays in individual computations, so distributed systems offered the most pragmatic solution. Other distributed computing applications include large-scale records management and text mining.
The Road Ahead
Since the emergence of supercomputers in the 1960s, supercomputer performance has often been measured in floating point operations per second (FLOPS). The CDC 6600, a popular early supercomputer, reached a peak processing speed of 500 kilo-FLOPS in the mid-1960s. To put this in perspective, the processor in an iPhone 5S is nearly 250,000 times faster than the CDC 6600. Since the 1960s, the capabilities of supercomputers have grown tremendously. In 2013, the world’s fastest supercomputer, China’s Tianhe-2, could operate at a peak speed of nearly 34 peta-FLOPS, a 70-billionfold speed increase
Meanwhile, the Amazon cloud, one of the world’s fastest distributed systems, achieved a speed of 1.2 peta-FLOPS for the first time in 2013. While this cannot compete with supercomputers like the Tianhe-2, distributed systems can typically be built much more cheaply than supercomputers. A 2013 HP study found that the hourly cost of renting a processor on a dedicated supercomputer was approximately 2-3 times as great as on a comparable distributed cloud-based system.
Does the relative low cost of distributed computing mean the government should stop investing in supercomputers? Absolutely not. Supercomputers provide a distinct and irreplaceable set of capabilities and will continue to be of critical importance to national priorities for years to come to address problems such as cancer research, macroeconomic modeling, and natural disaster forecasting.
The federal government should continue to fund research for both supercomputing and distributed computing. So far, we are moving in the right direction. The 2014 National Defense Authorization Act directs the Department of Energy to develop supercomputers capable of exa-FLOPS speeds, also known as “exascale” supercomputers, within 10 years, and Obama administration has made distributed computing a key part of its “big data strategy.”
But there is more that could be done. If the federal government wants to maximize the value of its investments in high-performance computing, it will need to reduce barriers to using these technologies. This means it should continue to ensure that high-speed networking infrastructure is available to scientists at a broad range of locations and build tools that allow researchers who lack expertise in supercomputing to leverage high-performance systems. In addition, the world of high-performance computing is evolving quickly and federally-funded research should continue to support investments in next-generation computing technology such as quantum computing and molecular computing.
Photo: Flickr user Sam Churchill