Nvidia finally took over Mellanox, the world leader for end-to-end Ethernet and Infiniband solutions in May 2020. Now, in autumn 2020, we’re beginning to see this merger bearing fruit. Together with its other recent acquisitions, Nvidia is working to become a leader in artificial intelligence and machine learning in the high-performance computer market.
By buying Mellanox, now known as Nvidia Networking, Nvidia is extending its capabilities in the data centre sector. Previously only known to many as a manufacturer of gaming graphics cards, Nvidia’s GPUs (Graphic Processing Units) are becoming increasingly important in the High-Performance Computing (HPC) sector. Whereas CPUs are designed to handle many different tasks, GPUs are ideal for processing many similar tasks. This makes GPUs perfect for deep learning and machine learning. So it’s also clear why Nvidia wants to expand in the direction of high-performance interconnection.
They are planning to offer customers an end-to-end solution; from AI computing through to networks for data centres, says Jensen Huang, Nvidia’s CEO. Nvidia’s further (planned) acquisitions also fit with this strategy. The takeover of the network software manufacturer Cumulus Networks is already complete; the takeover of the chip manufacturer Arm has been announced. These acquisitions are also strategically aligned with the HPC market.
New Nvidia networking products
A central feature of artificial intelligence – and deep learning in particular – is the processing of large quantities of data. As previously mentioned, although GPUs are highly suited to this task and Nvidia, with its Ampere architecture, is one of the market leaders here, supplying the computing units with data has become a bottleneck in the data centre.
Mellanox SmartNICs have supported GPUDirect for a long time now, which supports GPU-to-GPU communication. What’s new is the support for GPUDirect Storage; to improve the connection between the network storage and the computing units.
The new Nvidia DPUs were presented in May 2020, aiming to further improve the computing units’ data supply. According to Nvidia, these should become just as an important part of the data centre architecture as CPUs and GPUs already are.
DPU stands for Data Processing Unit and takes on the dispatching and processing of data in the data centre network. The PCIe cards consist of three components: a high-performance-multi-core CPU with Arm architecture, a high-performance network interface, and multiple engines which take on tasks such as encryption, communication and storage services.
The BlueField-2X DPUs are a first step in this direction. These not only contain the Nvidia Mellanox ConnectX-6 Dx and BlueField2 chips, but the AI functionality of the Nvidia Ampere GPUs as well. This extends the functionality of the DPUs, giving them the ability to analyse data traffic in real-time. This means they can also perform security analyses; detecting deviations in the data flow and providing rapid warnings about attacks and malware.
However, the true strengths of this combination are demonstrated by the Nvidia EGX project. With EGX, Nvidia is offering converging accelerators – enabling provision of data centre functionality at the Edge. This is great for applications which take a lot of computing power, which rely on short latencies. The new EGX A100 cards contain an Ampere GPU for AI calculations with far-reaching security and encryption features, as well as a Mellanox ConnectX 6-Dx chip, offering dual 100 Gb/s Ethernet or InfiniBand interfaces. This enables them to process data from millions of IoT sensors in real-time – offering important services and information processing for hospitals, in agriculture or in manufacturing environments.
Cooperation with Lenovo
For several years now, Mellanox and Nvidia have been cooperating successfully with Lenovo in the data centre sector. This cooperation is now being extended. Cumulus Networks, now also a part of Nvidia, is also playing a role.
From the autumn, Lenovo is supplying its customers with Nvidia Mellanox Spectrum Ethernet switches inside the integrated Lenovo ThinkAgile solutions for the Microsoft Azure Stack and the Lenovo solutions for SAP HANA. Lenovo has been selling the Quantum InfiniBand switches for a long time, and these are also available for HPC solutions.
The switches from Nvidia Mellanox allow Lenovo customers to provide an Ethernet fabric combining powerful hardware with Cumulus Linux, the open networking operating system. This enables far-reaching and profound changes to be made to the switches, permitting the implementation of a bespoke network.