TECHNOLOGY IN CONTEXT
Optimizing Machine Vision Systems
FPGAs – Taking Vision to the Next Level
As today’s machine vision applications become ever more demanding, the unique capabilities of FPGAs, such as parallelism and low power consumption, can greatly enhance performance. But their advantages often depend on a good understanding of the use case. Often, in fact, they can be used in tandem with CPUs for the best overall advantage.
CARLTON HEARD, NATIONAL INSTRUMENTS
Page 1 of 1
Today, manufacturing companies are striving to lower costs and increase quality and throughput, robots are becoming smarter and more flexible, and automation is a hot topic with a large amount of resources backing it. Vision is one of the key enabling technologies behind these trends, and it has been growing rapidly over the past couple of decades. But the performance of image processing applications has been largely tied to advances in CPU speed. Vision has been riding the CPU frequency wave to run more complex algorithms at higher camera frame rates and resolutions, but lately the nearly exponential growth in CPU performance has been tapering off compared to the explosive growth of the past decade.
Vision applications must rely on alternative solutions to increase speed rather than simply depending on a faster processor. One option is to divide the image processing algorithm and do more in parallel, as many of the algorithms used in vision applications are very well suited to handle this. Technologies like SSE, hyperthreading and multiple cores can be been used to parallelize and do more without increasing the raw clock rate. However, there are issues when selecting this option. Unless the software package being used abstracts the complexity, there are difficulties in programming software to use multiple threads or cores. Data must be sent between threads, which can result in memory copies and synchronization jitter. Additionally, it is generally a manual process to take an existing single-threaded image processing algorithm and make it multicore compatible. Even then, cost often prohibits parallelizing very much because most system designers do not have the option to purchase a 16-core server class computer for each test cell they create.
One solution for this issue is made possible with an FPGA, as it is fundamentally a semiconductor device that contains a large quantity of logic gates, which are not interconnected and whose function is determined by a wiring list that is downloaded to the FPGA. The wiring list determines how the gates are interconnected and this interconnection is performed dynamically by turning semiconductor switches on or off to enable different connections. The benefit of using an FPGA is that it is essentially software-defined hardware. Therefore, system designers can program the chip in software, and once that software is downloaded to the FPGA, the code becomes actual hardware that can be reprogrammed as needed. Using an FPGA for image processing is especially beneficial as it is inherently parallel. Algorithms can be split up to run thousands of different ways and can remain completely independent. While FPGAs are inherently well suited for many vision applications, there are still certain aspects of the system that may not be as suited to run on the FPGA. There are a number of features to consider when evaluating whether to use an FPGA for image processing.
Considerations for Using an FPGA
FPGAs have incredibly low latency (on the order of microseconds) when they are already in the image path. This is critical because latency accounts for the time it takes until a decision is made based on the image data. When using FPGAs with high-speed camera buses such as Camera Link that do not buffer image data, the FPGA can begin processing the image as soon as the first pixel is sent from the camera rather than waiting until the entire image readout has completed. This reduces the time between exposure and image processing by nearly an entire frame period, making it possible to achieve extremely tight control loops for applications like laser tracking and in-flight defect rejection systems.
FPGAs can help avoid jitter. Because they do not have the overhead of other threads, an operating system or interrupts, FPGAs are extremely deterministic. For many image processing algorithms, it is possible to determine the exact execution time down to nanoseconds.
For massively parallel computation or heavily pipelined math, the raw computation power of an FPGA can be an advantage over a CPU-based system. An important consideration, however, is to understand what image processing algorithms are needed for the application. If the algorithm is iterative and cannot take advantage of the parallel nature of an FPGA, it is most likely best suited for a CPU-based system.
If a loop has multiple operations running within it and those operations run sequentially, the time it takes for the loop iteration to complete is the sum of the time each operation takes to run (Figure 1). One way to increase the processing loop rate is to parallelize the operations through pipelining. By doing this, the processing loop rate is limited only by the slowest operation rather than the sum of them all (Figure 2). This approach increases speed along with latency because the result is not valid until multiple loop iterations are complete. For pixel-by-pixel operations including kernel operations, dilate, erode or edge-finding, algorithms can be stacked back-to-back incorporating only marginal latency.
When operations are programmed sequentially, the loop rate is limited by the sum of all times for each operation.
Pipelining speeds up loop rates as each operation can run in parallel. In this case, the loop rate is only limited by the operation that takes the longest.
Security can also be an issue. Since the image processing occurs in hardware with FPGAs, the image and code stays within the FPGA. This is beneficial if applications require the image or IP to remain secure and hidden from the user.
And don’t forget the factors of power and heat. An FPGA may consume 1-10 watts of power, while a CPU of the same performance can easily consume 50-200 watts. With that much power, there is also a lot of heat that must be dissipated. For fanless embedded applications this may result in a more complex and larger mechanical design. The lower power consumption of an FPGA is particularly useful for extreme conditions such as space, airborne and underwater applications.
Considerations for Using a CPU
As with most applications, there are tradeoffs to consider along with potential benefits. While FPGAs offer many advantageous features, there are still instances where a CPU may be more beneficial. Consider the following tradeoffs when determining whether an FPGA, a CPU, or a combination is most appropriate for a particular vision application.
Often the use of an FPGA can add complexity to the design process. Hardware programming is a significant departure from traditional software programming as there is a non-trivial learning curve. However, high level synthesis tools such as LabVIEW FPGA are available to abstract much of this complexity, enabling the designer to take advantage of FPGA technology without a deep knowledge of VHDL programming.
There are also great differences in clock rates between FPGAs and CPUs. Clock rates of an FPGA are on the order of 100 MHz to 200 MHz, which are significantly lower than a CPU that can easily run at 3.0+ GHz. Therefore, if an application requires an image processing algorithm that must run iteratively and cannot take advantage of the parallelism of an FPGA, a CPU results in faster processing. This serves as another reminder to evaluate the system requirements and algorithms before selecting between an FPGA or CPU.
Is there are big need for floating point support? Floating point is difficult to achieve on an FPGA. This is somewhat mitigated by using fixed point or high level synthesis tools, but it is a factor that must be kept in mind when using FPGAs that may not even need to be considered when working with a CPU.
In many applications, the combination of an FPGA and a CPU to handle various aspects of the design can be very useful. DMA can help pass data back and forth between the devices and each device can be used to take care of the processing that is most appropriate for each chip. This is not to say that an FPGA or a CPU is incapable of performing all tasks, but some are better suited for one chip versus the other and using both can simplify the design while making it possible to gain high performance. Many applications can benefit from this architecture.
Matching the Needs of Application Categories
There are four main categories including visualization, high-speed control, image preprocessing and co-processing. Visualization takes an image from a camera and changes it for the purpose of enhancing it to display for human eyes. In this case, the FPGA reads the image from the camera and performs some type of in-line processing such as highlighting edges and features of interest or masking features. Then the FPGA outputs the image directly to a monitor or sends it to the host CPU for display. In most instances, the FPGA directly outputs the image as low latency and jitter are important in the system. As an example, with medical devices an image is taken and cells are processed and displayed on the monitor for a doctor to review. The FPGA can be used to measure the size and color of each cell and highlight specific cells for the doctor to focus on.
In high-speed control applications, instead of an image for display as the output it is some other type of I/O such as a digital signal controlling an actuator. In these applications, the time between when an image is acquired and an action is taken must be fast and consistent, so an FPGA is preferred due to the low latency and low jitter it offers. This very tight integration with vision and I/O enables advanced applications like visual servoing, which is when visual data is used as direct feedback for positioning and control with servo motors. Often all the inspection and decision-making can be accomplished on the FPGA with little or no CPU intervention, but a CPU can still be used for supervisory control or operator interaction. Applications best suited for high-speed control include high-speed alignment, where one object needs to stay within a given position relative to another as in laser alignment and high-speed sorting (Figure 3).
FPGAs can be used for advanced control applications such as high-speed laser tracking. Low latency and jitter are requirements for adaptive optics that are possible with FPGA image processing.
From food products and rocks to manufacturing goods and recycled garbage, there is a huge bottleneck for efficiently and quickly sorting items based on color, shape, size, texture, etc. The ability to acquire an image, process it and output a result within the FPGA can speed up this process, resulting in more accurate sorting so that fewer good parts are rejected and fewer bad parts are accepted. A more specific example where FPGAs can be especially beneficial is with air sorting, which involves imaging, inspecting and sorting a product while it is falling. Low jitter is critical for this type of application because the time between the decision-making and I/O must be known.
Image preprocessing and co-processing are nearly the same with the difference being which device initially acquires the image. In both situations the FPGA works in conjunction with a CPU to process images. When preprocessing images, the image data travels through the FPGA, which modifies or enhances the data, before sending it to the host for further processing and analysis. Co-processing implies that the image data is sent to the FPGA from the CPU instead of a camera. This scenario is most common for post-processing large batches of images once they are acquired. One of the most exciting examples is using FPGAs to boost the speed and efficiency of Optical Coherence Tomography (OCT). This is a technique for obtaining sub-surface images of translucent or opaque materials at a resolution equivalent to a low-power microscope. It is effectively an “optical ultrasound” that images reflections from within tissue to provide cross-sectional images. OCT is attracting interest among the medical community, as it provides tissue morphology imagery at a much higher resolution (better than 10 µm) than other imaging modalities such as ultrasounds or MRIs (Figure 4).
Kitasato University used FPGAs to create the world’s first real-time 3D OCT medical imaging system.
A typical OCT system uses a line-scan camera and a special light source that sweeps across a tissue and images the surface beneath, one line at a time. Once each line is acquired, the data is scaled and converted to the frequency domain, where the data is further manipulated and combined with other lines to reveal a high resolution, 3D picture of a tissue. With industrial inspection, there are many applications today that use brute force methods to check for defects over large and continuous areas, as seen in web inspection. FPGAs can be used to preprocess the large amounts of data associated with web inspection through performing flat field correction, thresholding and particle analysis.
The advantages of an FPGA for image processing are dependent upon each use case, including the specific algorithms used, latency or jitter requirements, I/O synchronization, power and programming complexity. In many cases, using an architecture featuring both an FPGA and a CPU presents the best of both worlds and offers a competitive advantage in terms of performance, cost and reliability. With a multitude of inherent benefits, FPGAs are poised to take many vision applications including medical imaging and vision motion integration to the next level.