Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization

by   Paul Seurin, et al.

The nuclear fuel loading pattern optimization problem has been studied since the dawn of the commercial nuclear energy industry. It is characterized by multiple objectives and constraints, with a very high number of candidate patterns, which makes it impossible to solve explicitly. Stochastic optimization methodologies are used by different nuclear utilities and vendors to perform fuel cycle reload design. Nevertheless, hand-designed solutions continue to be the prevalent method in the industry. To improve the state-of-the-art core reload patterns, we aim to create a method as scalable as possible, that agrees with the designer's goal of performance and safety. To help in this task Deep Reinforcement Learning (RL), in particular, Proximal Policy Optimization is leveraged. RL has recently experienced a strong impetus from its successes applied to games. This paper lays out the foundation of this method and proposes to study the behavior of several hyper-parameters that influence the RL algorithm via a multi-measure approach helped with statistical tests. The algorithm is highly dependent on multiple factors such as the shape of the objective function derived for the core design that behaves as a fudge factor that affects the stability of the learning. But also an exploration/exploitation trade-off that manifests through different parameters such as the number of loading patterns seen by the agents per episode, the number of samples collected before a policy update, and an entropy factor that increases the randomness of the policy trained. Experimental results also demonstrate the effectiveness of the method in finding high-quality solutions from scratch within a reasonable amount of time. Future work must include applying the algorithms to wide range of applications and comparing them to state-of-the-art implementation of stochastic optimization methods.


page 20

page 23

page 26

page 28


Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from...

ORL: Reinforcement Learning Benchmarks for Online Stochastic Optimization Problems

Reinforcement Learning (RL) has achieved state-of-the-art results in dom...

Policy Optimization for Continuous Reinforcement Learning

We study reinforcement learning (RL) in the setting of continuous time a...

How to Enable Uncertainty Estimation in Proximal Policy Optimization

While deep reinforcement learning (RL) agents have showcased strong resu...

Muti-Agent Proximal Policy Optimization For Data Freshness in UAV-assisted Networks

Unmanned aerial vehicles (UAVs) are seen as a promising technology to pe...

Wasserstein Reinforcement Learning

We propose behavior-driven optimization via Wasserstein distances (WDs) ...

Learning swimming escape patterns under energy constraints

Swimming organisms can escape their predators by creating and harnessing...

Please sign up or login with your details

Forgot password? Click here to reset