Reward-driven symbols for learning from human preferences

This work was accepted at the Human in the Loop Learning Workshop at NeurIPS 2022.

Tuning reward functions for reinforcement learning is a difficult task that is circumvented by Preference-Based Learning methods, which instead learn from the preference labels of trace queries. These methods, however, still suffer from high preference label requirements and often still achieve low reward recovery. We present a PRIOR framework that alleviates the problems of impractical number of requests for humans as well as poor reward recovery by computing priorities on a reward function based on environmental dynamics and a surrogate preference ranking model. We find that setting these priorities as soft constraints significantly reduces human queries in the loop and improves overall reward recovery. Additionally, we explore the use of an abstract state space to compute these priors to further improve the agent’s performance.

Source link