GLDQN: Explicitly Parameterized Quantile Reinforcement Learning for Waste Reduction

by   Sami Jullien, et al.

We study the problem of restocking a grocery store's inventory with perishable items over time, from a distributional point of view. The objective is to maximize sales while minimizing waste, with uncertainty about the actual consumption by costumers. This problem is of a high relevance today, given the growing demand for food and the impact of food waste on the environment, the economy, and purchasing power. We frame inventory restocking as a new reinforcement learning task that exhibits stochastic behavior conditioned on the agent's actions, making the environment partially observable. We introduce a new reinforcement learning environment based on real grocery store data and expert knowledge. This environment is highly stochastic, and presents a unique challenge for reinforcement learning practitioners. We show that uncertainty about the future behavior of the environment is not handled well by classical supply chain algorithms, and that distributional approaches are a good way to account for the uncertainty. We also present GLDQN, a new distributional reinforcement learning algorithm that learns a generalized lambda distribution over the reward space. We show that GLDQN outperforms other distributional reinforcement learning approaches in our partially observable environments, in both overall reward and generated waste.


page 1

page 2

page 3

page 4


Distributional Reinforcement Learning with Quantile Regression

In reinforcement learning an agent interacts with the environment by tak...

A neurally plausible model learns successor representations in partially observable environments

Animals need to devise strategies to maximize returns while interacting ...

Robotic Packaging Optimization with Reinforcement Learning

Intelligent manufacturing is becoming increasingly important due to the ...

Distributional GFlowNets with Quantile Flows

Generative Flow Networks (GFlowNets) are a new family of probabilistic s...

Value Variance Minimization for Learning Approximate Equilibrium in Aggregation Systems

For effective matching of resources (e.g., taxis, food, bikes, shopping ...

Apprenticeship Learning for Model Parameters of Partially Observable Environments

We consider apprenticeship learning, i.e., having an agent learn a task ...

Joint Learning of Reward Machines and Policies in Environments with Partially Known Semantics

We study the problem of reinforcement learning for a task encoded by a r...

Please sign up or login with your details

Forgot password? Click here to reset