Ultra-low latency machine learning inference accelerator for the finance industry

Designed for fintech

VOLLO™ is designed to achieve the lowest latency on financial neural network models, while maximizing throughput, quality and energy- and space-efficiency. Its success has been convincingly demonstrated by its performance in the STAC-ML™ Markets (Inference) benchmarks1 which represent such models.

Unrivalled low latency

Independently audited results show that VOLLO has as much as 20x lower latency than its nearest competitor. VOLLO achieves latencies as low as 5.1 microseconds for the neural network models defined in the STAC–ML benchmarks.

Although not audited by STAC, the compute latency (excluding off-chip communications) is of the order of just 1 microsecond. This means VOLLO is fast enough to open up new applications, such as inference in a NIC subsystem.

Works with your ML model

VOLLO is optimized for time-series inference of financial AI models, supporting a wide range of network layer types such as these.

Additional layer types may be added on request.

  • Fully Connected
  • LSTM
  • 1D Convolution

Simple to install

VOLLO runs on industry-standard PCIe accelerator cards, compatible with standard data center servers and powered by Intel® Agilex™ FPGAs. These include the IA-840f and IA-420f cards from BittWare.

High accuracy

High accuracy is achieved through the use of floating point format in all operations. Models can be trained in FP32 or bfloat16 and run on VOLLO in bfloat16 format without the need for retraining or accuracy compromises.

High throughput and power-efficiency

Designed to be installed in a server co-located in a stock exchange, VOLLO achieves very high throughput and low energy consumption in a 1U server. This significantly reduces the costs incurred in running co-located servers. Up to four PCIe accelerator cards will run in a 1U server at less than 650W.

Simple to program

Models can be trained in PyTorch or TensorFlow before being exported in ONNX format into the VOLLO tool suite, making it simple to program from your existing ML development environment.

Flexible for future-proofing

The flexibility of FPGA technology ensures that not only can VOLLO be software-configured with users’ LSTM model configurations, but significant architectural innovations can also be adopted quickly with optimal compute resources3.