A Federated Approach to Predicting Emojis in Hindi Tweets

by   Deep Gandhi, et al.

The use of emojis affords a visual modality to, often private, textual communication. The task of predicting emojis however provides a challenge for machine learning as emoji use tends to cluster into the frequently used and the rarely used emojis. Much of the machine learning research on emoji use has focused on high resource languages and has conceptualised the task of predicting emojis around traditional server-side machine learning approaches. However, traditional machine learning approaches for private communication can introduce privacy concerns, as these approaches require all data to be transmitted to a central storage. In this paper, we seek to address the dual concerns of emphasising high resource languages for emoji prediction and risking the privacy of people's data. We introduce a new dataset of 118k tweets (augmented from 25k unique tweets) for emoji prediction in Hindi, and propose a modification to the federated learning algorithm, CausalFedGSD, which aims to strike a balance between model performance and user privacy. We show that our approach obtains comparative scores with more complex centralised models while reducing the amount of data required to optimise the models and minimising risks to user privacy.


page 1

page 2

page 3

page 4


COVID-19 Imaging Data Privacy by Federated Learning Design: A Theoretical Framework

To address COVID-19 healthcare challenges, we need frequent sharing of h...

A Graph Federated Architecture with Privacy Preserving Learning

Federated learning involves a central processor that works with multiple...

Federated Learning in MIMO Satellite Broadcast System

Federated learning (FL) is a type of distributed machine learning at the...

Applied Federated Learning: Architectural Design for Robust and Efficient Learning in Privacy Aware Settings

The classical machine learning paradigm requires the aggregation of user...

FRAMU: Attention-based Machine Unlearning using Federated Reinforcement Learning

Machine Unlearning is an emerging field that addresses data privacy issu...

Porównanie metod detekcji zajętości widma radiowego z wykorzystaniem uczenia federacyjnego z oraz bez węzła centralnego

Dynamic spectrum access systems typically require information about the ...

Please sign up or login with your details

Forgot password? Click here to reset