While many skill discovery methods have been proposed to accelerate learning and planning, most are heuristic methods without a clear relationship to the agent’s objective. The conditions under which the algorithms are effective is therefore often unclear. We claim that we should pursue skill discovery algorithms with explicit relationships to the objective of the agent to understand in what scenarios skill discovery methods are useful.
To this end, we analyzed two scenarios, planning and reinforcement learning by Jinnai et al. (2019), and showed how to identify skill discovery criteria that directly address the relevant objectives. For planning, we showed that finding a set of options that minimizes planning time is NP-hard, and gave polynomial-time algorithms that are approximately optimal under certain conditions. For reinforcement learning, we showed that under certain conditions, the difficulty of discovering a distant rewarding state in an MDP is bounded by the expected cover time of a random walk over the graph induced by the MDP’s transition dynamics. We proposed covering options, a method to automaticall generate skills that optimize the cover time so as to minimize the learning time.
While covering options is useful for tabular setting, it is not directly applicable to deep reinforcement learning setting where the state-space is huge. To this end, we proposed deep covering options, a method to generate skills similarily to covering options but instead of enumerating the state-space it guesses the state dynamics by a neural network.