, Sweat-proof “smart skin” takes reliable vitals, even during workouts and spicy meals
, Sweat-proof “smart skin” takes reliable vitals, even during workouts and spicy meals

Unleash the Power CUDA and GPGPU in the Industrial Environment

Hardly any application needs more resources than a modern computer game, and graphics cards have always been real giants in terms of computing power. The desire to use this performance, which is now in the teraflops range, was born early on. Today, 10 years after Berkeley University’s first application SETI@home, the number of applications is countless, and for industrial applications, CompactPCI systems provide the right hardware platform.

TFLOPS

The tasks of a video card require a breathtakingly fast access to the image memory with relatively simple operations, but these run on as many many pixels as possible in parallel. Accordingly, the graphics processor is equipped with many similar cores, the so-called “streaming multiprocessors” (SMs). Each SM in turn usually contains 8 streaming processor cores plus registers and support logic. Today, high-end graphics cards can achieve performance levels of 5-15 TFlops, which is several times what the strongest Intel CPUs can do. In order to be able to use this computing power in a general way for arbitrary calculations, various programming tools exist, such as OpenCL or CUDA (Compute Unified Device Architecture). By now this is referred to as general purpose computing on a graphics processing unit, GPGPU for short. Today, there is hardly a field of application in which the computing acceleration due to parallel processing in the graphics card is not used. NVIDIA’s GPU application page covers everything from bioinformatics (sequencing, molecular dynamics) to financial math (credit risk), fluid dynamics, medical imaging, artificial intelligence, pattern recognition, and weather and climate forecasting, all of which are well known as computationally intensive.

PATTERNS

But there are also applications that seem harmless at first glance and yet are more and more in the limelight: radar/lidar applications. The basic principle is familiar to every reader of this journal: a high frequency pulse (e.g., 24GHz) is emitted by a transmitting antenna, reflected and scattered by a metallic object, and partially echoed onto a receiving antenna. Based on the runtime of the pulse going back and forth and the speed of light the distance can be calculated.

, Sweat-proof “smart skin” takes reliable vitals, even during workouts and spicy mealsBy taking advantage of the Doppler effect (frequency shift of the transmission signal) conclusions can be made on the relative movement of transmitter and object. The underlying mathematics essentially requires the frequency analysis of the received signal, the domain of the discrete Fourier transform. The fast Fourier transform (FFT), for example with the Butterfly algorithm, is a prime example of a parallel algorithm that can be outstandingly implemented in the C for CUDA programming language. The available computing power for this algorithm then determines the different resolution parameters of the radar images.

Radar or the variant Lidar is indispensable in safety technology for the analysis and observation of moving objects. For example, the monitoring of ground movements, especially on parallel runways in combination with the associated airspace, is one of the main safety functions in airport operations. The associated technology is close to the antenna and is therefore often housed in containers outside.

With the future autonomous vehicles in private transport, environmental analysis has become one of the most important AI research areas, which must evaluate and process a combination of various electromagnetic and optical sensors.

Another application of parallel algorithms is the field of pattern recognition, which in turn is indispensable for all AI applications, whether it is the analysis of the purchase of suspect chemical substances, the assessment of hazardous situations or automated waste separation. Neural networks can be superbly implemented on a parallel computer architecture. Here too, the application sites often require robust industrial computers, special temperature ranges, particularly stable power supplies or simply the integration of further measured values from the field.

COMPACTPCI SERIAL

It was obvious for the industrial computer specialists EKF from Hamm in Germany to take up the subject of GPGPU at an early stage and to integrate the CUDA technology into their CompactPCI Serial assembly platform. This platform can be used to build genuine industrial computers that work safely under ambient conditions that normal PCs will not survive. The special features begin with the proven mechanical integrability of the 19″subrack standard and an indirect, gas-tight connection of the subassemblies to the backplane of the bus. An extended temperature range, robustness against humidity and dust due to module coating, special power supplies that defy voltage fluctuations as well as transient voltage peaks in the mobile environment, all serve to protect the industrial computer against external influences.

In addition, CompactPCI Serial modules offer all that is required for building high-performance computer systems, thanks to their PCI bus bandwidth and XEON processors. In the meantime, CompactPCI Serial Systems with CUDA technology are in use by several EKF customers in various applications.

The most important component for integrating CUDA is the SV2-MOVIE, a peripheral module that integrates an MXM 3.0/3.1 graphics module into the CompactPCI Serial System. This module has 4 display ports although these do not have to be used for GPGPU applications. As an example, NVIDIA GTX 1070 Pascal or NVIDIA QUADRO P3000/P5000 modules can be operated on this carrier board.

For use as a CUDA system, a comprehensive BIOS adaptation was necessary to ensure that the subsystem fits seamlessly into the CompactPCI system. CUDA support of up to four SV2-MOVIE was implemented in the process, providing a computing power of more than 25 TFlops per CompactPCI CPU.

CONCLUSION

Anyone who needs a lot of computing power, in the smallest of spaces, in an unfavorable environment or integrated in a mobile system or in a control cabinet, in addition to support from a German manufacturer with many years of experience in the industrial field, will find a good counterpart at the Embedded World 2019 in Nuremberg.

Comments are closed.