Discounting the Past in Stochastic Games
Stochastic games, introduced by Shapley, model adversarial interactions in stochastic environments where two players choose their moves to optimize a discounted-sum of rewards. In the traditional discounted reward semantics, long-term weights are geometrically attenuated based on the delay in their occurrence. We propose a temporally dual notion – called past-discounting – where agents have geometrically decaying memory of the rewards encountered during a play of the game. We study past-discounted weight sequences as rewards on stochastic game arenas and examine the corresponding stochastic games with discounted and mean payoff objectives. We dub these games forgetful discounted games and forgetful mean payoff games, respectively. We establish positional determinacy of these games and recover classical complexity results and a Tauberian theorem in the context of past discounted reward sequences.
READ FULL TEXT