Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

10/19/2021

∙

Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel (ε, δ)-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an 𝒪̃( d^5/4H^7/4T^3/4(log(1/δ))^1/4√(1/ε)) regret, where d is the dimension of feature mapping, H is the length of the planning horizon, and T is the number of interactions with the environment. We also prove a lower bound Ω(dH√(T)/(e^ε(e^ε-1))) for learning linear mixture MDPs under ε-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

READ FULL TEXT

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Sign in with Google

Consider DeepAI Pro