Overcoming The Spectral Bias of Neural Value Approximation

Ge Yang,^†§ Anurag Ajay,^§ Pulkit Agarwal ^§

^*Equal Contribution (random order),
^†Institute of AI and Fundamental Interactions (IAIFI),
^§Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT

CODE|PAPER|SLIDES

Overview

We identify a learning bias for a multi-layer perceptron to favor low-frequency function components as the source of learning instability during Q value iteration, and propose random Fourier features as a solution that allows us to overcome this spectral-bias.

Abstract

Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high- frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. In this work, we re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural tangent kernel. With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. Faster convergence and better off-policy stability also make it possible to remove the target network without suffering catastrophic divergences, which further reduces TD(0)'s estimation bias on a few tasks.

BibTex

@inproceedings{
        yang2022overcoming,
        title={Overcoming The Spectral Bias of Neural Value Approximation},
        author={Ge Yang and Anurag Ajay and Pulkit Agrawal},
        booktitle={International Conference on Learning Representations},
        year={2022},
        url={https://openreview.net/forum?id=vIC-xLFuM6}
    }