International Symposium on Software Composition

Reconfigurable computing

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the datapath itself in addition to the control flow. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric.

The concept of reconfigurable computing has existed since the 1960s, when Gerald Estrin's paper proposed the concept of a computer made of a standard processor and an array of "reconfigurable" hardware. The main processor would control the behavior of the reconfigurable hardware. The latter would then be tailored to perform a specific task, such as image processing or pattern matching, as quickly as a dedicated piece of hardware. Once the task was done, the hardware could be adjusted to do some other task. This resulted in a hybrid computer structure combining the flexibility of software with the speed of hardware.

The increase of logic in an FPGA has enabled larger and more complex algorithms to be programmed into the FPGA. The attachment of such an FPGA to a modern CPU over a high speed bus, like PCI express, has enabled the configurable logic to act more like a coprocessor rather than a peripheral. This has brought reconfigurable computing into the high-performance computing sphere.

This heterogeneous systems technique is used in computing research and especially in supercomputing. A 2008 paper reported speed-up factors of more than 4 orders of magnitude and energy saving factors by up to almost 4 orders of magnitude. Some supercomputer firms offer heterogeneous processing blocks including FPGAs as accelerators. One research area is the twin-paradigm programming tool flow productivity obtained for such heterogeneous systems.

Electronic hardware, like software, can be designed modularly, by creating subcomponents and then higher-level components to instantiate them. In many cases it is useful to be able to swap out one or several of these subcomponents while the FPGA is still operating.

A common example for when partial reconfiguration would be useful is the case of a communication device. If the device is controlling multiple connections, some of which require encryption, it would be useful to be able to load different encryption cores without bringing the whole controller down.

With the advent of affordable FPGA boards, students' and hobbyists' projects seek to recreate vintage computers or implement more novel architectures. Such projects are built with reconfigurable hardware (FPGAs), and some devices support emulation of multiple vintage computers using a single reconfigurable hardware (C-One).

Mitrionics has developed a SDK that enables software written using a single assignment language to be compiled and executed on FPGA-based computers. The Mitrion-C software language and Mitrion processor enable software developers to write and execute applications on FPGA-based computers in the same manner as with other computing technologies, such as graphical processing units (“GPUs”), cell-based processors, parallel processing units (“PPUs”), multi-core CPUs, and traditional single-core CPU clusters. (out of business).

Xilinx has developed two styles of partial reconfiguration of FPGA devices: module-based and difference-based. Module-based partial reconfiguration permits to reconfigure distinct modular parts of the design, while difference-based partial reconfiguration can be used when a small change is made to a design.

The granularity of the reconfigurable logic is defined as the size of the smallest functional unit (configurable logic block, CLB) that is addressed by the mapping tools. High granularity, which can also be known as fine-grained, often implies a greater flexibility when implementing algorithms into the hardware. However, there is a penalty associated with this in terms of increased power, area and delay due to greater quantity of routing required per computation. Fine-grained architectures work at the bit-level manipulation level; whilst coarse grained processing elements (reconfigurable datapath unit, rDPU) are better optimised for standard data path applications. One of the drawbacks of coarse grained architectures are that they tend to lose some of their utilisation and performance if they need to perform smaller computations than their granularity provides, for example for a one bit add on a four bit wide functional unit would waste three bits. This problem can be solved by having a coarse grain array (reconfigurable datapath array, rDPA) and a FPGA on the same chip.

Configuration of these reconfigurable systems can happen at deployment time, between execution phases or during execution. In a typical reconfigurable system, a bit stream is used to program the device at deployment time. Fine grained systems by their own nature require greater configuration time than more coarse-grained architectures due to more elements needing to be addressed and programmed. Therefore, more coarse-grained architectures gain from potential lower energy requirements, as less information is transferred and utilised. Intuitively, the slower the rate of reconfiguration the smaller the energy consumption as the associated energy cost of reconfiguration are amortised over a longer period of time. Partial re-configuration aims to allow part of the device to be reprogrammed while another part is still performing active computation. Partial re-configuration allows smaller reconfigurable bit streams thus not wasting energy on transmitting redundant information in the bit stream. Compression of the bit stream is possible but careful analysis is to be carried out to ensure that the energy saved by using smaller bit streams is not outweighed by the computation needed to decompress the data.