Privacy-Preserving Reinforcement Learning Beyond Expectation

by   Arezoo Rajabi, et al.

Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans. In such a setting, it is important to align system (or agent) behaviors with the preferences of one or more human users. We consider the case when an agent has to learn behaviors in an unknown environment. Our goal is to capture two defining characteristics of humans: i) a tendency to assess and quantify risk, and ii) a desire to keep decision making hidden from external parties. We incorporate cumulative prospect theory (CPT) into the objective of a reinforcement learning (RL) problem for the former. For the latter, we use differential privacy. We design an algorithm to enable an RL agent to learn policies to maximize a CPT-based objective in a privacy-preserving manner and establish guarantees on the privacy of value functions learned by the algorithm when rewards are sufficiently close. This is accomplished through adding a calibrated noise using a Gaussian process mechanism at each step. Through empirical evaluations, we highlight a privacy-utility tradeoff and demonstrate that the RL agent is able to learn behaviors that are aligned with that of a human user in the same environment in a privacy-preserving manner


page 1

page 2

page 3

page 4


Reinforcement Learning Beyond Expectation

The inputs and preferences of human users are important considerations i...

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Reinforcement learning (RL) algorithms can be used to provide personaliz...

Private Reinforcement Learning with PAC and Regret Guarantees

Motivated by high-stakes decision-making domains like personalized medic...

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

It is challenging to use reinforcement learning (RL) in cyber-physical s...

Passive and Privacy-preserving Human Localization via mmWave Access Points for Social Distancing

The pandemic outbreak has profoundly changed our life, especially our so...

adaPARL: Adaptive Privacy-Aware Reinforcement Learning for Sequential-Decision Making Human-in-the-Loop Systems

Reinforcement learning (RL) presents numerous benefits compared to rule-...

The Need for Inherently Privacy-Preserving Vision in Trustworthy Autonomous Systems

Vision is a popular and effective sensor for robotics from which we can ...

Please sign up or login with your details

Forgot password? Click here to reset