reVISION Stack: Accelerating your Embedded Vision development

Home » Articles » reVISION Stack: Accelerating your Embedded Vision development

Embedded Vision is ubiquitous across a range of industries and applications, from ADAS and Guided Robotics to medical imaging and augmented reality. The breadth of embedded vision penetration across multiple market segments is staggering. In most of these applications the downstream image processing pipeline is very similar. This downstream pipeline contains functions such as Image Sensor / Camera interfacing and reconstruction of the image in a format suitable for further processing. Commonly used algorithms within downstream processing are colour reconstruction (Bayer Filter), colour space conversion and noise reduction algorithms. It is the application specific algorithms where the differences between applications become apparent. Implementing these is where the embedded vision developer expends significant time and effort.

These application algorithms are often complex to implement, using techniques such as object detection and classification, filtering and computational operations. Increasingly these application algorithms are developed using open source frameworks like OpenCV and Caffe. The use of these open source frameworks enables the Embedded Vision developer to focus on implementing the algorithm. Using the provided pre-defined functions and IP contained within, removes the need to start from scratch which significantly reduces the development time.

, reVISION Stack: Accelerating your Embedded Vision development

Figure 1 – reVISION Stack

Depending upon the application, the challenge faced by the designer is not only how to implement the desired algorithms. The Embedded Vision developer must also address both challenges faced by the application and its environment while considering future market trends.

These challenges and trends include processing and decision making at the edge as increasingly Embedded Vision applications are autonomous and cannot depend upon a connection to the cloud. One example of this would be vision guided robotics, which are required to process and act on information gleaned from its sensors to navigate within its environment. Many applications also implement sensor fusion, fusing several different sensor modalities to provide an enhanced understanding of the environment and further aid the decision-making, bringing with it increased processing demands. Due to the rapid evolution of both sensors and image processing algorithms the system must also be able to be upgraded to support the latest requirements of the product roadmap. The rise of autonomous and remote applications also brings with it the challenges of efficient power dissipation and security to prevent unauthorized modification attempts.

To address these challenges, developers use Xilinx® All Programmable System on Chip (SoC) and Multi Processor System on Chip (MPSoC) devices from the Zynq®-7000 and Zynq® UltraScale™+ MPSoC families to implement their solution. These devices provide high performance processors closely coupled with programmable logic, allowing the Embedded Vision developer to optimize their solution.

Figure 2 – Accelerated OpenCV
Harris Corner Detection

The use of Zynq SoC or Zynq UltraScale+ MPSoC devices enable the developer to benefit from the any-to-any connectivity which comes with the use of programmable logic. This programmable logic can also implement the image processing pipeline(s), providing a performance increase due to its parallel nature. Using the programmable logic increases the system performance, connectivity and the performance per watt of power dissipated providing a more efficient solution overall.

The processing cores can be used for higher level application functionality, such as decision making based on the provided information and communication between systems and with the cloud.

To address the security concerns which come with autonomous and remote applications, both device families provide a secure environment with support for encrypted secure boot and ARM® Trust Zone technology within the processor, and the ability to implement anti tamper functionality.

Using Zynq-7000 and Zynq UltraScale+ MPSoC devices provide significant capability to the Embedded Vision developers allowing the challenges and trends to be addressed. Leveraging these capabilities requires a development ecosystem that enables the Embedded Vision developer to utilize not only the benefits of using these devices, but also provide the ability to still use the commonly used frameworks within their solution. This is where the reVISION™ Stack comes in.

reVISION Stack

The reVISION Stack was developed to enable Embedded Vision developers to address the four key challenges identified above, which are evident within the embedded vision sphere. These challenges can be summarized as responsivity, reconfigurability, connectivity and software defined.

To address these four driving trends, the reVISION Stack combines a wide range of resources enabling platform, application and algorithm development. As such, the stack is aligned into three distinct levels:

Platform layer. This is the lowest level of the stack and is the one on which the remaining layers of the stack are built. As such it provides both a hardware definition of the configuration of the Zynq-7000 / Zynq UltraScale+ MPSoC and the software definition via a customized operating system to support the hardware definition. This hardware definition can define the configuration of either a development or production ready board such as a System on Module. It is within the hardware definition that the sensor and system interfaces are defined. The hardware platform will be captured using Vivado® HLX, and may leverage IP blocks from both Xilinx and third party suppliers along with the use of high level synthesis to create specialist IP. This layer will also provide software drivers for IP modules and an updated PetaLinux configuration if required, to support the software defined environment at the higher level.
The middle level of the stack is called the algorithm layer. Development at this level takes place within the eclipse based SDSoC™ environment. SDSoC is a system optimizing compiler which allows development using a software defined environment. Crucially as we develop our software algorithms, bottlenecks in performance can be identified and removed by accelerating functions into the programmable logic. To the user this process is seamless, using a combination of High Level Synthesis and a connectivity framework to move a function from executing in software to implementation in the programmable logic. It is at this level OpenCV is used to implement the image processing algorithms for the application at hand. To reduce identified bottlenecks within the image processing algorithm, reVISION provides a wide range of acceleration ready OpenCV functions. Support is also provided at this level for the most common neural network libraries, including AlexNet, GoogLeNet, SqueezeNet, SSD, and FCN.
The final layer is the application development level, and it is where the high-level frameworks such as Caffe and OpenVX are used to complete the application, implementing the decision-making functionality for example. Applications at this level are developed using an eclipse based environment targeting the processor cores within the Zynq-7000/Zynq UltraScale+ MPSoC.

The capability provided by the reVISION stack provides all the necessary elements to create high performance imaging applications across a wide range of applications from industrial internet of things, to vision guided robotics and beyond.

Accelerating OpenCV

One of the most exciting aspects of the reVISION stack is the ability to accelerate a wide range of OpenCV functions within the algorithm development layer. Within this layer, the OpenCV functions capable of being accelerated can be grouped into one of four high level categories.

Computation – Includes functions such as absolute difference between two frames, pixel wise operations (addition, subtraction and multiplication), gradient and integral operations
Input Processing – Provides support for bit depth conversions, channel operations, histogram equalisation, remapping and resizing
Filtering – Provides support for a wide range of filters including Sobel, custom convolution and Gaussian filters
Other – Provides a wide range of functions including Canny/Fast/Harris edge detection, thresholding and SVM and HoG classifiers

Figure 3 – Traditional OpenCV
Implementation

Developers can use these functions to create an algorithmic pipeline within the programmable logic of the chosen device. Being able to implement logic in this way significantly increases the performance of the algorithm implementation.

Of course, as these acceleration capable OpenCV libraries are software defined and support high level synthesis, they can also be used within the Vivado HLS tool. This enables the creation of IP modules which can be used within the platform layer when the hardware definition is established.

One commonly used algorithm in OpenCV is the implementation of Harris Corner detection, used to detect corners within an image. Within the reVISION Stack, there is a predefined function for Harris Corner detection. When comparing the performance of the reVISION accelerated Harris Corner detection against a normal OpenCV implementation as demonstrated below, both provide identical performance. However, using the reVISION Harris Corner function accelerated into the PL the user gains an increase in system performance which enables a more responsive and power efficient solution.

Within the reVISION stack, if developers chose to accelerate OpenCV functions, they can optimize the design for resource usage and performance within the programmable logic. The main method through which this is achieved is the number of pixels which are processed on each clock cycle. For most accelerated functions, they can choose to process either a single pixel or eight pixels. Processing more pixels per clock cycle increases the resource utilization required while reducing the processing time. Processing one pixel per clock will provide a reduced resource requirement, which comes at the cost of an increased latency. This selection of number of pixels per clock is configured via the function call, providing a very simple method to optimize the design as required.

With the design performance optimized using the acceleration capable OpenCV libraries, the embedded vision developer can then develop the higher levels of the application using the capabilities provided by the Algorithm and Application layers of the stack.

Conclusion

The use of All Programmable Zynq-7000 and Zynq UltraScale+ MPSoC devices within embedded vision applications brings several advantages in flexibility, performance, security/safety and power-efficient processing. Developing the application within the reVISION stack allows several commonly used industry-standard frameworks to be utilized, bringing with it reduced development times and faster time to market.