Large language models (LLMs) based on transformers have made significant...
Running out of GPU memory has become a main bottleneck for large-scale D...
Automatic Speech Recognition (ASR) has seen remarkable advancements with...
Efficient deployment of large language models (LLMs) necessitates low-bi...
Mixture-of-experts (MoE) is becoming popular due to its success in impro...
In trained deep neural networks, unstructured pruning can reduce redunda...