Recently, pre-trained Transformer based language models, such as BERT, h...
Communication cost is one major bottleneck for the scalability for
distr...
Understanding how large neural networks avoid memorizing training data i...
While vector-based language representations from pretrained language mod...
To train large models (like BERT and GPT-3) with hundreds or even thousa...
Scalable training of large models (like BERT and GPT-3) requires careful...
Adam is the important optimization algorithm to guarantee efficiency and...
As modern neural networks have grown to billions of parameters, meeting ...
Deep neural networks (DNNs) have shown much empirical success in solving...
Encouraged by the success of deep neural networks on a variety of visual...
Training with larger number of parameters while keeping fast iterations ...
Video understanding usually requires expensive computation that prohibit...
Machine-learning (ML) hardware and software system demand is burgeoning....
Federated learning has become increasingly important for modern machine
...
Semantic segmentation algorithms that can robustly segment objects acros...
Machine learning is experiencing an explosion of software and hardware
s...
Scene graphs have become an important form of structured knowledge for t...
Communication is a key bottleneck in distributed training. Recently, an
...
Communication is a key bottleneck in distributed training. Recently, an
...
To solve tasks in new environments involving objects unseen during train...
Monitoring the population and movements of endangered species is an impo...
A standard approach in large scale machine learning is distributed stoch...
Structured representations such as scene graphs serve as an efficient an...
Detection and segmentation of objects in overheard imagery is a challeng...
Generating realistic images from scene graphs asks neural networks to be...
Most of today's distributed machine learning systems assume reliable
ne...
While training a machine learning model using multiple workers, each of ...
Optimizing distributed learning systems is an art of balancing between
c...
Making inferences from partial information constitutes a critical aspect...