Policy Gradient Approach to Compilation of Variational Quantum Circuits
We propose a method for finding approximate compilations of quantum circuits, based on techniques from policy gradient reinforcement learning. The choice of a stochastic policy allows us to rephrase the optimization problem in terms of probability distributions, rather than variational parameters. This implies that searching for the optimal configuration is done by optimizing over the distribution parameters, rather than over the circuit free angles. The upshot of this is that we can always compute a gradient, provided that the policy is differentiable. We show numerically that this approach is more competitive than those using gradient-free methods, even in the presence of depolarizing noise, and argue analytically why this is the case. Another interesting feature of this approach to variational compilation is that it does not need a separate register and long-range interactions to estimate the end-point fidelity. We expect these techniques to be relevant for training variational circuit in other contexts
READ FULL TEXT