language-icon Old Web
English
Sign In

Proto-value functions

In applied mathematics, proto-value functions (PVFs) are automatically learned basis functions that are useful in approximating task-specific value functions, providing a compact representation of the powers of transition matrices. They provide a novel framework for solving the credit assignment problem. The framework introduces a novel approach to solving Markov decision processes (MDP) and reinforcement learning problems, using multiscale spectral and manifold learning methods. Proto-value functions are generated by spectral analysis of a graph, using the graph Laplacian. In applied mathematics, proto-value functions (PVFs) are automatically learned basis functions that are useful in approximating task-specific value functions, providing a compact representation of the powers of transition matrices. They provide a novel framework for solving the credit assignment problem. The framework introduces a novel approach to solving Markov decision processes (MDP) and reinforcement learning problems, using multiscale spectral and manifold learning methods. Proto-value functions are generated by spectral analysis of a graph, using the graph Laplacian. Proto-value functions were first introduced in the context of reinforcement learning by Sridhar Mahadevan in his paper, Proto-Value Functions: Developmental Reinforcement Learning at ICML 2005. Value function approximation is a critical component to solving Markov decision processes (MDPs) defined over a continuous state space. A good function approximator allows a reinforcement learning (RL) agent to accurately represent the value of any state it has experienced, without explicitly storing its value. Linear function approximation using basis functions is a common way of constructing a value function approximation, like radial basis functions, polynomial state encodings, and CMACs. However, parameters associated with these basis functions often require significant domain-specific hand-engineering. Proto-value functions attempts to solve this required hand-engineering by accounting for the underlying manifold structure of the problem domain. Proto-value functions are task-independent global basis functions that collectively span the entire space of possible value functions for a given state space. They incorporate geometric constraints intrinsic to the environment. For example, states close in Euclidean distance (such as states on opposite sides of a wall) may be far apart in manifold space. Previous approaches to this nonlinearity problem lacked a broad theoretical framework, and consequently have only been explored in the context of discrete MDPs. Proto-value functions arise from reformulating the problem of value function approximation as real-valued function approximation on a graph or manifold. This results in broader applicability of the learned bases and enables a new class of learning algorithms, which learn representations and policies at the same time. In this approach, we will construct the basis functions by spectral analysis of the graph Laplacian, a self-adjoint (or symmetric) operator on the space of functions on the graph, closely related to the random walk operator. For the sake of simplicity, assume that the underlying state space can be represented as an undirected unweighted graph G = ( V , E ) {displaystyle G=(V,E)} The combinatorial Laplacian L {displaystyle L} is defined as the operator L = D − A {displaystyle L=D-A} , where D {displaystyle D} is a diagonal matrix called the degree matrix and A {displaystyle A} is the adjacency matrix. The spectral analysis of the Laplace operator on a graph consists of finding the eigenvalues and eigenfunctions which solve the equation where L {displaystyle L} is the combinatorial Laplacian, φ λ {displaystyle varphi _{lambda }} is an eigenfunction associated with the eigenvalue λ {displaystyle lambda } . Here the term 'eigenfunction' is used to denote what is traditionally referred to as eigenvector in linear algebra, because the Laplacian eigenvectors can naturally be viewed as functions that map each vertex to a real number.

[ "Basis function", "Bellman equation", "State space", "Reinforcement learning", "value" ]
Parent Topic
Child Topic
    No Parent Topic