Capture the full potential of IT

Perspective: Looking Back on the Rise of Supercomputing

Featured content

Perspective: Looking Back on the Rise of Supercomputing

Applications:
Featured Technologies:

We’ve come a long way on the development of high performance computing. Back in 2004, I attended an event held in the gym at the University of San Francisco. The goal was to crowd-source computing power by connecting the PCs of volunteers who were participating in the first “Flash Mob Computing” cluster computing event. Several hundred PCs were networked together in the hope that they would create one of the largest supercomputers, albeit for a few hours.

I brought two laptops for the cause. The participation rules stated that the data on our hard drives would remain intact. Each computer would run a specially crafted boot CD that ran a benchmark called Linpack, a software library for performing numerical linear algebra running on Linux. It was used to measure the collective computing power.

The event attracted people with water-cooled overclocked PCs, naked PCs (no cases, just the boards and other components) and custom-made rigs with fancy cases. After a few hours, we had roughly 650 PCs on the floor of the gym. Each PC was connected to a bunch of Foundry BigIron super-switches that were located around the room.

The 2004 experiment brought out several industry luminaries, such as Gordon Bell, who was the father of the Digital Equipment Corporation VAX minicomputer, and Jim Gray, who was one of the original designers behind the TPC benchmark while he was at Tandem. Both men at the time were Microsoft fellows. Bell was carrying his own laptop but had forgotten to bring his CD drive, so he couldn’t connect to the mob.

Network shortcomings

What was most interesting to me, and what gave rise to the mob’s eventual undoing, were the networking issues involved with assembling and running such a huge collection of gear. The mob used ordinary 100BaseT Ethernet, which was a double-edged sword. While easy to set up, it was difficult to debug when network problems arose. The Linpack benchmark requires all the component machines to be running concurrently during the test, and the organizers had trouble getting all 600-plus PCs to operate online flawlessly. The best benchmark accomplished was a peak rate of 180 gigaflops using 256 computers, but that wasn’t an official score as one node failed during the test.

To give you an idea of where this stood in terms of overall supercomputing prowess, it was better than the Cray supercomputers of the early 1990s, which delivered around 16 gigaflops.If you lo

At the website top500.org (which tracks the fastest supercomputers around the globe), you can see that all the current top 500 machines are measured in petaflops (1 million gigaflops). The Oak Ridge National Laboratory’s Frontier machine, which has occupied the number one spot this year, weighs in at more than 1,000 petaflops and uses 8 million cores. To make the fastest 500 list back in 2004, the mob would have had to achieve a benchmark of over 600 gigaflops. Because of the networking problems, we’ll never know for sure.Still, it was an impressive achievement, given the motley mix of machines. All of the world’s top 500 supercomputers are custom built and carefully curated and assembled to attain that level of computing performance.

Another historical note: back in 2004, one of the more interesting entries came in third on the top500.org list: a collection of several thousand Apple Macintoshes running at Virginia Polytechnic University. Back in the present, as you might imagine, almost all the fastest 500 supercomputers are based on a combination of CPU and GPU chip architectures.

Today, you can buy your own supercomputer on the retail market, such as the Supermicro SuperBlade® models. And of course, you can routinely run much faster networking protocols than 100-megabit Ethernet.

Featured videos

Events

Sixth Annual Supermicro Open Storage Summit

A virtual event; Aug. 12–28, 2025

Discover AI's impact on storage

Learn more >

IDC CIO Summit

Riyadh, Saudi Arabia; Sept. 17–18, 2025

Architecting an AI-fueled business

Learn more >

Computerworld Cloud & AI Festival

Copenhagen, DE; Sept. 17-18, 2025

Join 2,400+ IT pros to learn about infrastructure, security & more

Learn more >

Find AMD & Supermicro Elsewhere

Supermicro SuperBlades®: Designed to Power Through Distributed AI/ML Training Models

Featured content

Supermicro SuperBlades®: Designed to Power Through Distributed AI/ML Training Models

Running heavy AI/ML workloads can be a challenge for any server, but the SuperBlade has extremely fast networking options, upgradability, the ability to run two AMD EPYC™ 7000-series 64-core processors and the Horovod open-source framework for scaling deep-learning training across multiple GPUs.

Applications:
Featured Technologies:

Running the largest artificial intelligence (AI) and machine learning (ML) workloads is a job for the higher-performing systems. Such loads are often tough for even more capable machines. Supermicro’s SuperBlade combines blades using AMD EPYC™ CPUs with competing GPUs into a single rack-mounted enclosure (such as the Supermicro SBE-820H-822). That leverages an extremely fast networking architecture for these demanding applications that need to communicate with other servers to complete a task.

The Supermicro SuperBlade fits everything into an 8U chassis that can host up to 20 individual servers. This means a single chassis can be divided into separate training and model processing jobs. The components are key: servers can take advantage of the 200G HDR InfiniBand network switch without losing any performance. Think of this as delivering a cloud-in-a-box, providing both easier management of the cluster along with higher performance and lower latencies.

The Supermicro SuperBlade is also designed as a disaggregated server, meaning that components can be upgraded with newer and more efficient CPUs or memory as technology progresses. This feature significantly reduces E-waste.

The SuperBlade line supports a wide selection of various configurations, including both CPU-only and mixed CPU/GPU models, such as the SBA-4119SG, which comes with up to two AMD EPYC™ 7000-series 64-core CPUs. These components are delivered on blades that can easily slide right in. Plus, they slide out as easily when you need to replace the blades or the enclosure. The SuperBlade servers support a wide network selection as well, ranging from 10G to 200G Ethernet connections.

The SuperBlade employs the Horovod distributed model-training, message-passing interface to let multiple ML sessions run in parallel, maximizing performance. In a sample test of two SuperBlade nodes, the solution was able to process 3,622 GoogleNet images/second, and eight nodes were able to scale up to 13,475 GoogleNet images/second.

As you can see, Supermicro’s SuperBlade improves performance-intensive computing and boosts AI and ML use cases, enabling larger models and data workloads. The combined solution enables higher operational efficiency to automatically streamline processes, monitor for potential breakdowns, apply fixes, more efficiently facilitate the flow of accurate and actionable data and scale up training across multiple nodes.

Featured videos

Events