Capture the full potential of IT
Weka’s file system, WekaFS, unifies your entire data lake into a shared global namespace where you can more easily access and manage trillions of files stored in multiple locations from one directory.
One of the challenges of building machine learning (ML) models is managing data. Your infrastructure must be able to process very large data sets rapidly as well as ingest both structured and unstructured data from a wide variety of sources.
That kind of data is typically generated in performance-intensive computing areas like GPU-accelerated applications, structural biology and digital simulations. Such applications typically have three problems: how to efficiently fill a data pipeline, how to easily integrate data across systems and how to manage rapid changes in data storage requirements. That’s where Weka.io comes into play, providing higher-speed data ingestion and avoiding unnecessary copies of your data while making it available across the entire ML modeling space.
Weka’s file system, WekaFS, has been developed just for this purpose. It unifies your entire data lake into a shared global namespace where you can more easily access and manage trillions of files stored in multiple locations from one directory. It works across both on-premises and cloud storage repositories and is optimized for cloud-intensive storage so that it will provide the lowest possible network latencies and highest performance.
This next-generation data storage file system has several other advantages: it is easy to deploy, entirely software-based, plus it is a storage solution that provides all-flash level performance, NAS simplicity and manageability, cloud scalability and breakthrough economics. It was designed to run on any standard x86-based server hardware and commodity SSDs or run natively in the public cloud, such as AWS.
Weka’s file system is designed to scale to hundreds of petabytes, thousands of compute instances and billions of files. Read and write latency for file operations against active data is as low as 200 microseconds in some instances.
Supermicro has produced its own NVMe Reference Architecture that supports WekaFS on some of its servers, including the Supermicro A+ AS-1114S-WN10RT and AS-2114S-WN24RT using the AMD EPYC™ 7402P processors with at least 2TB of memory, expandable to 4TB. Both servers support hot-swappable NVMe storage modules for ultimate performance. Also check out the Supermicro WekaFS A/I and HPC Solution Bundle.
In F1, fast cars and fast computers go hand in hand. Computational performance became more important when F1 IT authorities added rules that dictate how much computing and wind tunnel time each team can use. Mercedes was the top finisher in 2021 giving it the biggest compute/wind tunnel handicap. So, when it selected a new computer system, it opted for AMD EPYC™ processors, gaining 20% performance improvement to get more modeling done in less time.
In the high-stakes world of Formula One racing, finding that slight edge to build a better performing car often means using the most powerful computers to model aerodynamics. The Mercedes-AMG Petronas F1 racing team found that using AMD EPYC™ processors helps gain that edge. Since 2010, the team has brought home 124 race wins and nine driver’s championships across the F1 racing circuit.
Thanks to the increased performance of these AMD EPYC™ CPUs, the team is able to run twice the number of daily simulations. The key is having the best computational fluid dynamics models available. And time is of the essence because the racing association’s IT authorities have added rules that dictate how much computing and wind tunnel time each team can use, along with a dollar limit on computing resources to level the playing field despite resource differences.
Teams that traditionally have been top finishers of the race are allowed a third less computing time, and since the Mercedes team was the top 2021 finisher, it has the least computing allocation. The 2022 race limited computing expenditures to $140M, and for 2023, the number will be further cut to $135M. The result is that teams are focused on finding the highest performing computers at the lowest cost. In F1, fast cars and fast computers go hand in hand.
“Performance was the key driver of the decision making,” said Simon Williams, Head of Aero Development Software for the team. “We looked at AMD and the competitors. We needed to get this right, because we're going to be using this hardware for the next three years.” Mercedes replaced its existing three-year old computers with AMD EPYC™-based systems and gained 20% performance improvements, letting it run many more simulations in parallel. “I can't stress enough how important the fast turnaround is,” Williams said. “It's been great having AMD help us achieve that."
Servers such as the Supermicro A+ series can bring home big wins as well.
The AMD Threadripper™ CPU may be a desktop processor, but desktop computing was never like this. The new chipset comes in a variety of multi-core versions, with a maximum of 64 cores running up to 128 threads, 256MB of L3 cache and 2TB of DDR 8-channel memory. The newest Threadrippers are built with AMD’s latest 7 nanometer dies.
Content creators, designers, video animators and digital FX experts make much higher demands of their digital workstations than typical PC users. These disciplines often make use of heavily threaded applications such as Adobe After Effects, Unreal Engine or CAD apps such as Autodesk. What is needed is a corresponding increase in computing power to handle these applications.
That’s where one solution comes in handy for this type of power user: the AMD Ryzen Threadripper™ CPU, which now has a PRO 5000 update. One advantage of these newer chipsets is that they can fit on the same WRX80 motherboards that supported the earlier Threadripper series. There are other configurations, including the ProMagix HD150 workstation sold by Velocity Micro. The solution provider is looking at testing overclocking on both the MSI and Asrock motherboards that they will include in their HD150 workstations. That’s right, a chip that’s designed from the get-go to be overclocked. Benchmarks using sample apps (mentioned above) ran about twice as fast as on competitors’ less-capable hardware. (Supermicro offers the MI2SWA-TF motherboard with the Threadripper chipset.)
Desktop Was Never Like This
The AMD Threadripper™ CPU may be a desktop processor, but desktop computing was never like this. The new chipset comes in a variety of multi-core versions, with a maximum of 64 cores running up to 128 threads, 256MB of L3 cache and 2TB of DDR 8-channel memory. The newest Threadrippers are built with AMD’s latest 7 nanometer dies.
The Threadripper CPUs are not just fast but come with several built-in security features, including support for Zen 3 and Shadow Stack. Zen 3 is the overall name for a series of improvements to the AMD higher-end CPU line that have shown a 19% improvement in instructions per clock. And they have lower latency for double the cache delivery when compared to the earlier Zen 2 architecture chips.
These processors also support Microsoft’s Hardware-enforced Stack Protection to help detect and thwart control-flow attacks by checking the normal program stack against a secured hardware-stored copy. This helps to boot securely, protect the computer from firmware vulnerabilities, shield the operating system from attacks, and prevent unauthorized access to devices and data with advanced access controls and authentication systems.
Lodestar is a complete management suite for developing artificial intelligence-based computer vision models from video data. It can handle the navigation and curation of a native video stream without any preparation. Lodestar annotates and labels video, and using artificial intelligence, creates searchable, structured data.
Lodestar doesn’t call it indexing, but the company has a product that annotates video, and using artificial intelligence (AI), creates searchable, structured data. Lodestar offers a complete management suite for developing AI-based computer vision models from video data. The company’s technology includes continuous training of its AI models along with real-time active learning and labeling.
The challenge for computer vision efforts before Lodestar's technology came into the picture was the sheer amount of data contained in any video stream: an hour of video contains trillions of pixels. The result was a very heavy computational load to manipulate and analyze. That meant video had to be pre-processed before anyone could analyze the stream. But thanks to performance-intensive computing, there are new ways to host more capable and responsive tools.
That's where Lodestar comes into play, handling the navigation and curation of a native video stream without any preparation, using the video as a single source of truth. Metadata is extracted on the fly so that each video frame can be accessed by an analyst. This is a highly CPU-intensive process, and Lodestar uses Supermicro A+ servers running Jupyter’s data science applications across a variety of containers. These servers have optimized hardware that combines AMD CPU and GPU chipsets with the appropriate amount of memory to make these applications function quickly.
By harnessing this power, data scientists can now collaborate in real time to validate the dataset, run experiments, train models and guide annotation. With Lodestar, data scientists and domain experts can develop a production AI in weeks instead of months.
That’s what a leading European optical and hearing aid retailer did to help automate its in-store inventory management processes and keep track of its eyewear collection. Before the advent of Lodestar, each store’s staff spent 10 hours a month manually counting inventory. That doesn’t sound like much until you multiply the effort by 300 stores. With Lodestar, store inventory is completed in minutes. Given that the stores frequently update their product offerings, this has brought significant savings in labor, and more accurate inventory numbers have provided a better customer experience.
Using six nanometer processes and the CDNA2 graphics dies, AMD has created the third generation of GPU accelerators, which have more than twice the performance of previous GPU processors and deliver 181 teraflops of mixed precision peak computing power.
AMD and Supermicro have made it easier to exploit the most advanced combination of GPU and CPU technologies.
Derek Bouius, a senior product manager at AMD, said “Using six nanometer processes and the CDNA2 graphics dies, we created the third generation of GPU chipsets that have more than twice the performance of previous GPU processors. They deliver 181 teraflops of mixed precision peak computing power.” Called the AMD Instinct MI210™ and AMD Instinct MI250™, they have twice the memory (64 GB) to work with and deliver data at the rate of 1.6 TB/sec. Both these accelerators are packaged as fourth generation PCIe expansion cards and come with direct connectors to Infinity Fabric bridges for faster I/O throughput between GPU cards -- without having their traffic go through the standard PCIe bus.
The Instinct accelerators have immediate benefit for improving performance in the most complex computational applications, such as molecular dynamics, computer-aided engineering, weather and oil and gas modeling.
"We provided optimized containerized applications that are pre-built to support the accelerator and run them out of the box," Bouius said. “It is a very easy lift to go from existing solutions to the AMD accelerator,” he added. It’s accomplished by bringing together AMD’s ROCm™ support libraries and tools with its HIP programming language and device drivers – all of which are open source. They can unlock the GPU performance enhancements to make it easier for software developers to take advantage of its latest processors. AMD offers a catalog of dozens of currently available applications.
Supermicro’s SuperBlade product line combines the new AMD Instinct™ GPU accelerators and AMD EPYC™ processors to deliver higher performance with lower latency for its enterprise customers.
One packaging option is to combine six chassis with 20 blades each, delivering 120 servers that provide a total of more than 3,000 teraflops of combined processing power. This equipment delivers more power efficiency in less space with fewer cables, providing a lower cost of ownership. The blade servers are all hot-pluggable and come with two onboard front-mounted 25 gigabit and two 10 gigabit Ethernet connectors.
“Everything is faster now for running enterprise workloads,” says Shanthi Adloori, senior director of product management for Supermicro. “This is why our Supermicro servers have won the world record in performance from the Standard Performance Evaluation Corp. three years in row.” Another popular design for the SuperBlade is to provide an entire “private cloud in a box” that combines administration and worker nodes and handles deploying a Red Hat Openshift platform to run Kubernetes-based deployments with minimal provisioning.
Related Resources
Building the next generation of technical computing equipment has become easier, thanks to the combination of International Computer Concepts’ (ICC) hardware and Define Tech Ltd.’s software and firmware. The result marks a new direction for this market segment, offering a more flexible and useful approach, because it comes with software and applications for running complex engineering simulations.
Building the next generation of technical computing equipment has become easier, thanks to the combination of hardware from International Computer Concepts (ICC) and software and firmware from Define Tech Ltd. You'll find the combined technology delivering solutions like computer-aided engineering, finite element analysis, computational fluid dynamics and geologic data analysis.
Such applications depend on huge datasets and complex computational requirements. They typically rely on clusters of multi-core computers, distributed storage and high-speed networking components.
The combination is called a turnkey cluster, and it is a good description because it marks a new direction for this market segment. In the past, clustered computers required a great deal of custom assembly, matching the components for throughput and performance, plus developing special firmware and software to take advantage of these benefits. This solution from ICC and Define Tech offers a more flexible and useful approach, because it comes with software and specialized applications that are optimized for running complex engineering simulations, such as Ansys and OpenFOAM.
The applications run across a collection of CPU chipsets from AMD, including the latest version of AMD’s EPYC™ 7003 series of processors that feature high processor core counts, high memory bandwidth and support for high-speed input/output channels in a single chip. These processors feature AMD 3D V-Cache™ technology and leverage true 3D die stacking for higher L3 cache delivery, which is helpful in these circumstances.
“With this latest addition to our HPC cluster suite, we aim to provide our customers an easy-to-use, cost-effective, AI-optimized solution made specifically for simulation-driven engineering workloads,” said ICC’s Director of Development, Alexey Stolyar.
For more on this, see ICC's project document as well as Define Tech’s explanatory page.