News and blogs

Explore the latest news  and insights from the  myrtle.ai team

Latest insights

Optimizing Llama3: Leveraging Blockfloat16 for Weights and Activations

We explore the use of Block Floating Point 16 (BFP16) for quantizing weights and activations in Llama3, with minimal accuracy loss, achieving up to 8x…

News

Myrtle.ai Enables Microsecond ML Inference Latencies running VOLLO on Napatech SmartNICs

ML inference directly on the network in a SmartNIC.

News

Myrtle.ai Enables Microsecond ML Inference Latencies for Larger Models on AMD Alveo V80 Compute Accelerator Card

VOLLO® inference accelerator on the AMD Alveo™ V80 compute accelerator card.

Blogs

CAIMAN-ASR: Pushing the Boundaries of Low-Latency Streaming Speech Recognition

11th April 2025. CAIMAN-ASR is the streaming speech recognition solution developed by Myrtle.ai in partnership…

News

Myrtle.ai Achieves 5.1 Microsecond Latency in Financial LSTM Inference Benchmark with VOLLO

Unrivalled latencies achievable by FSI companies with no FPGA design expertise

News

Myrtle.ai achieves unrivalled latencies in STAC-ML™ benchmarks with its VOLLO product

Enables financial firms to make faster and more intelligent ML decisions

Blogs

Vision Transformers 1: Low Earth Orbit Satellites

We investigate deploying Vision Transformers on low earth orbit satellites.

Blogs

Vision Transformers 2: Quantization and Sparsity

We apply quantization and sparsity to the Vision Transformer for optimized inference.

Blogs

Vision Transformers 3: Power Efficient Inference

We measure the performance and power efficiency of Vision Transformers on three different hardware platforms.