Explore the latest news and insights from the myrtle.ai team
We explore the use of Block Floating Point 16 (BFP16) for quantizing weights and activations in Llama3, with minimal accuracy loss, achieving up to 8x…
ML inference directly on the network in a SmartNIC.
VOLLO® inference accelerator on the AMD Alveo™ V80 compute accelerator card.
11th April 2025. CAIMAN-ASR is the streaming speech recognition solution developed by Myrtle.ai in partnership…
Unrivalled latencies achievable by FSI companies with no FPGA design expertise
Enables financial firms to make faster and more intelligent ML decisions
We investigate deploying Vision Transformers on low earth orbit satellites.
We apply quantization and sparsity to the Vision Transformer for optimized inference.
We measure the performance and power efficiency of Vision Transformers on three different hardware platforms.