We explore the use of Block Floating Point 16 (BFP16) for quantizing weights and activations in Llama3, with minimal accuracy loss, achieving up to 8x…
11th April 2025. CAIMAN-ASR is the streaming speech recognition solution developed by Myrtle.ai in partnership…
We investigate deploying Vision Transformers on low earth orbit satellites.
We apply quantization and sparsity to the Vision Transformer for optimized inference.
We measure the performance and power efficiency of Vision Transformers on three different hardware platforms.
A new family of FPGAs, optimized for machine learning inference, enable us to achieve a…
Are GPUs a good target for speech synthesis? Is Baidu's GPU implementation of WaveNet the…