Performance, performance, performance!

With the increasing need for processing vast data volumes a trend becomes apparent where more processing capacity is needed for increasing amounts of applications.

An interesting development of the last years can be observed. Where the geometry of silicon structures keeps pace with Moore’s law, the increase in processing speed is not! Instead of increasing clock rates, the solutions are sought by introducing more concurrency in the systems using multi-core implementations as well as heterogeneous processing platforms, such as System-on-Chips (SoC). These can be for instance combinations of CPUs with FPGA devices or GPUs. Just increasing the clock frequency of processors to achieve more performance will contribute to exponential increasing power consumption. Also, scalability of general-purpose processor architectures is problematic: two concurrent processors perform less than twice compared to a single core. What you need is selective acceleration with orders of magnitude performance gain. This requires different acceleration architectures, which scale better and have positive effects on power consumption. However, you want to maintain programming portability/compatibility between processors and accelerators.

Let us zoom in on a typical configuration where FPGAs are applied in combination with a processor like an Intel i7 in a PC or an Arm Cortex A9 processor in a Xilinx Zynq device. The architecture of a FPGA provides large scale concurrent data processing resources due to the many small logic, memory and DSP blocks connected via programmable interconnect. As an example: the floating-point performance of a typical quad core i7 PC running at 3GHz is comparable with the floating point performance of a Zynq 7030. However, the power consumption of the Zynq 7030 is 10x less. Using the FPGA as an acceleration platform for processors requires similar programming models to be effective. This dilemma is solved with the introduction of Dyplo.

Years of study, research and technology investigation resulted in the creation of Dyplo: seamless software and FPGA integration enabling full threaded software tasks to run dynamically on the FPGA ánd can be changed runtime.

How does it work? In a nutshell:

Topics Dyplo is a DYnamic Process LOader supporting dynamic task distribution and control over a heterogeneous embedded platform. Dyplo connects individual processes via a programmable transport media and data queues. This can be e.g. on-chip AXI4 bus interfaces or off-chip connections such as PCI-Express or Ethernet.

Dyplo enables on-the-fly volatility of processes and/or tasks over different processing units via these interfaces and handles all the queueing and process synchronization. This way the programmer can decide on compile-time AND at run-time which tasks or processes are executed where: on the processor or on the FPGA fabric. On the FPGA many execution workspaces can be created using the unique integrated support of partial reconfiguration. This enables software-like threading on an FPGA. These workspaces behave as small FPGA devices and can be reconfigured individually at any given time with very limited context switch time. So, Dyplo provides a distributed heterogeneous processing framework with process distribution and task synchronization.

The Dyplo Development Platform (DDE) guarantees ease of use by introducing a step-by-step flow resulting in a complete FPGA image.

The Result: proven reduce of BOM cost and development time while facilitating major acceleration with limited power usage. For more information, visit our website


DorArt_DYPLO image_no zynq_cropped