Video courtesy NVIDIA

The Ultimate Deep Learning server with up to 8 PCIe Gen 3 GPU accelerators on a Single Root Complex

A lot has changed since Alexey Grigorevich Ivakhnenko and V. G. Lapa published the first functional Deep Learning algorithm in 1965. Today’s technology allows for unsupervised learning of multiple levels of features or representations of data, building on the output from the previous layer as input, at speeds that were completely unimaginable 50 years ago.

BOXX GX8-D Rackmount Server Overview


The BOXX GX8-D Rackmount Series supports dual Intel® Xeon® Scalable Processors with up to 28 cores each and up to 10 PCIe Gen 3.0 x16 compatible devices enabling multidevice peering on a single PCIe root complex, such as the NVIDIA® Tesla® P40 GPU cards, powered by NVIDIA Pascal™ architecture, which is driving the AI revolution and enabling HPC breakthroughs, so you get only the best for your accelerated applications. This multi-device peering on a single PCIe root complex makes it a perfect solution for GPU accelerated applications and libraries like those used for deep learning, data analytics and molecular dynamics such as Torch 7, Theano, Caffe, and TensorFlow.

The GX8-D Series rackmount servers are different from other GPU supporting hardware implementations. Most of the hardware configurations available today only provide maximum performance between specific pairs of GPUs; and since GPUs are paired up, jobs requiring communication between arbitrary GPUs experience a performance impact. Additionally, there can be a significant performance impacts with trying to scale more than four GPUs on multi-socket systems. These have been persistent problems for customers who are pushing the limits of GPUs with large, complex data-sets and calculations, or where data must be streamed between GPUs. BOXX has been able to overcome these issues, and achieve near linear performance scaling with its design.

By utilizing the SR3615 PCIe 96-lane switch expander, the BOXX GX8-D supports up to eight NVIDIA Tesla P40 GPU cards and provides room for additional InfiniBand® or NVMe storage devices while enabling increased bandwidth and lower latencies between PCIe Gen3 devices than are possible in traditional systems. By enabling up to 8 discrete GPU accelerators to communicate directly with each other on the PCI bus, free of the need for host CPU intervention, they can create a "micro-cluster", sharing a single memory address space.


Maximize PCIe Bandwidth

BOXX is a strong believer in utilizing a technology to its fullest potential whenever possible and GPUs and GPU Accelerators are no exception. If the GPU has a PCIe Gen3 x16 link, then it should use it when communicating with other GPUs — any other GPUs. Our switch expander technology allows us to scale and peer multiple PCIe x16 Gen3 cards on a single root hub ensuring that the maximum PCIe bandwidth is available utilized for inter-card communication.

Minimize Intercard Latency and Obtain Consistent Performance Between GPUs

Our switch expander allows GPUs to communicate as if they are all on the same bus... because they are. Gone are the days of needing a bounce-buffer in host memory, or leaving GPU DMA engines unused because they couldn't address other devices in the system. This reduces intercard latency while helping to maintain a consistent performance level between GPUs.

Enable GPU-Centric Development and Usage

Since most all of the GPU traffic is passed between the GPUs directly via the SR3615 switch expander, a very negligible amount of host resources are needed to perform GPU work. Additionally, with a single address space and simultaneous inter-card communication at full PCIe x16 Gen3 speeds, software can spend more time doing work than thinking about when to schedule data copies.

Supports the Largest Number of GPU Offerings

We work closely with our technology partners to ensure you're given the broadest offerings for your application. The BOXX GX8-D Series supports both professional and consumer cards from the leading manufacturers including ground-breaking GPU accelerators, such as the NVIDIA® Tesla® P100 or P40 Accelerators designed specifically for deep learning deployments.




By utilizing PCIe 96-lane switch risers, the BOXX GX8 Rackmount supports up to eight NVIDIA® GPU cards and provides room for additional NVIDIA® Sync II cards, networking, storage adapters or NVMe storage devices.

Maximum Configuration
UP To 3.8 GHz
UP To 56 cores
Basic Configuration

  • Configurations will vary greatly based on specific needs. Please contact us for a quote.


  • Dual Intel® Xeon® Scalable Processors with up to 28 cores each
  • Up to 1TB DDR4-2400MHz Memory
  • Up to eight NVIDIA® Quadro™ or NVIDIA® Tesla™ professional graphics cards by utilizing PCIe Gen 3 Switch Expanders
  • Up to 4 x 1620W Power Supplies
  • 8 x PCIe x16 (Gen3 x16 bus) slots
    2 x PCIe x16 (Gen3 x8 bus) slots
  • IPMI 2.0-compliant ASMB8-iKVM module and ASWM Enterprise
    WfM 2.0, DMI 2.0, WOL by PME, PXE








What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning?

Artificial intelligence is the future. Artificial intelligence is science fiction. Artificial intelligence is already part of our everyday lives. All those statements are true, it just depends on what flavor of AI you are referring to. In this multi-part series, long-time tech journalist Michael Copeland explains the fundamentals of deep learning.



BOXX supports multiple configurations of its products and prefers to work closely with its customers and partners to determine the best fit for their company's needs. We work hard to listen and understand and can tailor any of our products to your specific requirements. If your’re looking to accelerate training and inference of deep neural networks using applications like TensorFlow, Caffe, Torch 7, Theano, Neon, and AMBER one of our performance specialists can guide you to the appropriate solution and configuration. Click below to connect with us.

in the USA

At BOXX, we’re engineers and creative professionals too. In fact, we rely on SolidWorks, 3Ds Max, and other applications every day. Our chassis are designed by BOXX engineers and proudly manufactured in the USA, but they aren’t built for sending emails or gaming. They’re crafted out of aircraft quality aluminum and steel strengthening components. That means maximum airflow and cool, quiet operation—even with the most demanding hardware configurations.

Tech Support

At BOXX, we understand that you need to be back working just as soon as possible when something goes wrong. That's why YOUR productivity is always our top priority. Our in-house technical support operatives will attempt to recreate any issues you have in an effort to reproduce even the most obscure problem. We'll even overnight parts when necessary during your premium warranty period.


The BOXX Workflow

Keep working while you render! BOXX offers unique hardware packages specifically designed to reduce the bottlenecks that plague professional software applications. By offloading your rendering, simulation, or other multi-threaded tasks, creativity never has to be put on hold by your hardware. That's the philosophy behind The BOXX Workflow.



We understand that it's important to know where your money goes when purchasing a premium workstation. BOXX offers services and solutions that go far beyond what you'll find at Dell, HP, or Apple.