A Floating Point Co-Processor for FPGAs

  • Page 1 of 1
    Bookmark and Share

There was a time when if an application had some special functions that needed to be executed very fast and often, one alternative was to use an FPGA as a special-purpose co-processor that could offload these operations from the CPU. The tables now seem to have turned a bit. BittWare has announced a floating point co-processor chip for use with Altera’s high-performance FPGAs. The Anemone chip, featuring the Epiphany architecture from Adapteva, is a scalable, true C-programmable, floating point engine that enables novel solutions for complex and evolving signal processing applications. 

Because it was specifically designed to be used alongside an FPGA as a co-processor, the Anemone simultaneously achieves superior power efficiency and processing performance. Each Anemone features 16 processors, providing 32 GFLOPS of floating point processing while consuming only 2 watts of total chip power. Multiple Anemones can be gluelessly connected, thereby scaling to create compute blocks of up to 4096 processors providing 8 TFLOPs of floating point performance. Delivering a standard processor software development environment that tightly integrates with an FPGA platform from Altera, the Anemone allows the best of two worlds to be combined—facilitating increased productivity and optimal solutions for complex signal processing applications.

Anemone was designed specifically for complex signal processing rather than for I/O, protocol processing, memory interfacing, or special functions, thus creating an extremely efficient chip compared to traditional floating point DSPs that may use only 5% of the silicon area for processing. This has translated into the scalable 1 GHz multicore processor. 

Each eCore processor features a compact, general-purpose instruction set that requires no instruction level parallelism and provides high program efficiency. All floating point computations are performed as single-precision IEEE 754; hardware looping is also supported. Anemone offers distributed and segmented memory, and large uniform register files. On-chip distributed shared memory is 4 Mbit (32 Kbyte per eCore) with 32 Gbyte/s of sustained memory bandwidth within each eCore. The cache-less shared memory architecture is extended off-chip, and between chips, via external I/O links. 

The Anemone features an internal high-throughput mesh network, with separate data paths for on-chip and off-chip communications. Each eCore processor has a multichannel DMA engine to support background data movement over the ‘eMesh’. Total on-chip, inter-core bandwidth is 128 Gbyte/s full duplex, with an additional 8 Gbyte/s of off-chip bandwidth. Each router node can simultaneously sustain full-duplex transfers on all ports, with automatic routing based on global addressing.

The Anemone provides a flexible low-overhead external interconnect scheme that supports memory-mapped direct connection of multiple Anemones and is compatible with any LVDS-capable FPGA. This is achieved via four Links that are full-duplex 8-bit LVDS data ports at 500 MHz DDR, each simultaneously providing 1 Gbyte/s in each direction for a total off-chip bandwidth of 8 Gbyte/s. Its FPGA co-processor use model provides the ultimate flexibility: since all external I/O goes through an FPGA, system designers can customize the I/O to their application’s specific requirements.

The Anemone reduces system development cost by enabling out-of-the-box execution of applications written in regular ANSI-C. It does not require any C-subset, language extensions, or SIMD. Standard GNU development tools are supported including an optimizing C complier, simulator, GDB debugger and Eclipse multi-core IDE.

The Anemone will be available from BittWare on standard COTS boards, including FMC (VITA 57), AdvancedMC (AMC), VPX (VITA 46/48/65) and PCI Express (PCIe) slot cards starting in Q3. Development boards, software and systems will also be available.

BittWare, Concord, NH. (603) 226-0404. [].