Customizable and scalable GPU cluster for complex, CI-driven computational tasks

Published:

Topics: Open hardware, Open cloud systems

Antmicro’s projects, both internal and customer, often require significant computational resources as well as flexibility in terms of resource management. For these reasons we have been developing Scalerunner, an open source compute cluster that provides us with control and scalability necessary for the increasing complexity of projects we work with. To further improve the capabilities of our clusters we then decided to employ high-end GPUs for the most demanding workloads such as AI or 3D rendering. To enable that, we initially designed a proof-of-concept Thunderbolt to PCIe adapter, as described in a previous blog note.

Now, going one step further, we introduce our open source GPU cluster setup, consisting of a new Thunderbolt to GPU adapter, a custom GPU cluster backplane supporting up to 8 GPUs and a custom enclosure, which together provide a modular, scalable and expandable GPU compute, especially important for both ourselves and our customers in the LLM era. In this note we’ll describe each component of the cluster and show how this setup has already been put to use in projects involving 3D rendering flows for hardware assets and AI-related computation.

Picture of the GPU cluster

Introducing the GPU cluster

In order to keep our solution modular and flexible we decided to connect our GPUs to the CI runners already deployed in the server infrastructure over Thunderbolt. To do that, we developed a Thunderbolt to GPU adapter that allows us to seamlessly connect any PCIe-enabled GPU to our Scalerunner cluster using Thunderbolt 3.0 and PCIe x4. Similar to the previous Thunderbolt to PCIe board, the new adapter is based on a JHL Thunderbolt 3 controller.

For our initial deployment, we designed the GPU cluster backplane supporting up to 8 Thunderbolt to GPU adapters, optimized for NVIDIA Tesla T4 and P4, as well as L4 and A2 Tensor Core GPUs dedicated for server applications. These are compact GPUs, with a heatsink instead of fans, which allows them to operate in constrained spaces. Thanks to that we could put multiple cards close to each other and add external fans behind them, so that two cards share a single fan. The backplane features GPU power delivery, power management, fan control and data aggregation designed with GPUs mentioned above in mind, as well as a connector for a 1600W server power supply. Thanks to the backplane, we can control and manage all GPUs at once, saving space that would otherwise be needed for connecting each individual GPU separately.

This modular design is enclosed in a CNC-milled aluminum chassis that mechanically integrates all components to fit into a 2U rack cabinet. It has its own integrated cooling managed by the GPU cluster backplane. Additionally, an embedded environment sensor helps control the environment parameters such as temperature and humidity inside the cluster.

Speeding up 3D render flows for hardware projects

With the GPU cluster we are able to generate visual assets for the Antmicro Open Hardware Portal, such as photorealistic images and animations, as described in a dedicated blog note. Our Open Hardware Portal database contains around 1800 components and over 30 boards and is rapidly growing, so having powerful and scalable infrastructure allows us to easily expand our computing resources.

Using our KiCad-Blender flow and the computational power of advanced GPUs we can easily generate assembly animations such as the one below, which presents the assembly of the GPU cluster itself:

With the CI-driven rendering flow we are also able to visualize device concepts for our customers even at the very beginning of a project and keep visualizations up to date as the development progresses.

Optimizing DNNs with GPU-enabled clusters

The GPU cluster is also actively used in our projects involving deep neural networks. With the CI-based infrastructure, we can easily jump between recent and older training procedures, check training artifacts, compare them to each other, and track how changes to the training pipeline in the code affect the final model.

Another common use case for the GPU cluster is optimization of deep neural networks with our open source Kenning framework - usually optimization algorithms such as quantization, pruning or knowledge distillation are not as time-consuming as training, but still can be significantly sped-up on GPUs with enough RAM. The cluster also provides more time for several iterations of model fine-tuning. We usually pair code optimization with model evaluation to check how the optimized model compares quality-wise to the original one and other optimization variants. With the GPU-enabled clusters, running such pipelines is usually a matter of minutes rather than hours, especially for large models, such as recently popular LLMs.

Customizable and scalable server solutions with Antmicro

Although this note describes our internal Scalerunner setup, both our GPU cluster software and hardware are modular and can be easily integrated with other server and CI solutions, which is a service we provide to our customers. Additionally, with this flow we can integrate not only GPUs, but any other PCIe-based ASIC or FPGA AI accelerators.

If you need an open and scalable server solution for demanding workloads such as 3D rendering or DNN training or inference, or would like to discuss customization options for the devices described in this note, don’t hesitate to contact us at contact@antmicro.com.

See Also: