While deep reinforcement learning techniques have led to agents that are successfully able to learn to perform a number of tasks that had been previously unlearnable, these techniques are still susceptible to the longstanding problem of reward sparsity. This is especially true for tasks such as training an agent to play StarCraft II, a real-time strategy game where reward is only given at the end of a game which is usually very long. While this problem can be addressed through reward shaping, such approaches typically require a human expert with specialized knowledge. Inspired by the vision of enabling reward shaping through the more-accessible paradigm of natural-language narration, we investigate to what extent we can contextualize these narrations by grounding them to the goal-specific states. We present a mutual-embedding model using a multi-input deep-neural network that projects a sequence of natural language commands into the same high-dimensional representation space as corresponding goal states. We show that using this model we can learn an embedding space with separable and distinct clusters that accurately maps natural-language commands to corresponding game states . We also discuss how this model can allow for the use of narrations as a robust form of reward shaping to improve RL performance and efficiency.
Many multi-robot planning problems are burdened by the curse of dimensionality, which compounds the difficulty of applying solutions to large-scale problem instances. The use of learning-based methods in multi-robot planning holds great promise as it enables us to offload the online computational burden of expensive, yet optimal solvers, to an offline learning procedure. Simply put, the idea is to train a policy to copy an optimal pattern generated by a small-scale system, and then transfer that policy to much larger systems, in the hope that the learned strategy scales, while maintaining near-optimal performance. Yet, a number of issues impede us from leveraging this idea to its full potential. This blue-sky paper elaborates some of the key challenges that remain.
We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in 2D obstacle avoidance and 2.5D terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
The output encodings of neural nets determine the structure of the space in which inference occurs. Yet, they are generally given very little thought. It is common practice for neural nets to use 1-Hot encoding when training to discriminate among many classes. The primary exceptions to this are error correcting output codes, and semantic output encodings. Output encodings based upon semantic descriptors cause a net to learn responses for classes to which it has not been exposed, provided those classes may be characterized by the same semantic descriptors. This raises a number of questions, such as "Can a net implicitly learn encodings for unobserved classes in the absence of a semantic encoding, or any encoding that requires some form of prior knowledge and hand crafting?". Also, are some output encodings better than others for learning these implicit encodings? In this paper, we will compare how effectively different non-semantic encodings are at causing a neural net to implicitly learn encodings for unobserved classes. Also, while evaluating the efficacy of these implicit encodings, we will look for evidence of a phenomenon akin to over-training. Specifically, as training on the observed classes occurs, we initially see improvement in how well the implicitly learned encodings can be used to differentiate among the classes which are unobserved during the net's training. However, as training continues to improve discrimination among the observed classes, the efficacy of the implicit codes either remains steady, or undergoes a degradation. This degradation is akin to the overtraining that one generally tries to guard against when training a neural net.
While there is extensive literature available on parallel manipulators in general, there has been much less attention given to cable-driven parallel manipulators. In this paper, we address the problem of analyzing the reachable workspace using the tools of semi-definite programming. We build on earlier work [1, 2] done using similar techniques by deriving limiting conditions that allow us to compute analytic expressions for the boundary of the reachable workspace. We illustrate this computation for a planar parallel manipulator with four actuators.
We develop a framework for controlling a team of robots to maintain and improve a communication bridge between a stationary robot and an independently exploring robot in a walled environment. We make use of two metrics for characterizing the communication: the Fiedler value of the weighted Laplacian describing the communication interactions of all the robots in the system, and the k-connectivity matrix that expresses which robots can interact through k or less intermediary robots. At each step, we move in such a way as to improve the Fiedler value as much as possible while keeping the number of intermediary robots between the two robots of interest below a desired value. We demonstrate the use of this framework in a scenario where the hop-count constraint cannot be satisfied, but show that communication quality is maintained anyways.
This letter considersa particular class of multi-robot task allocation problems, where tasks correspond to heterogeneous multi-robot routing problems defined on different areas of a given environment. We present a hierarchical planner that breaks down the complexity of this problem into two subproblems: the high-level problem of allocating robots to routing tasks, and the low-level problem of computing the actual routing paths for each subteam. The planner uses a Graph Neural Network (GNN) as a heuristic to estimate subteam performance for specific coalitions on specific routing tasks. It then iteratively refines the estimates to the real subteam performances as solutions of the low-level problems become availableon a testbed problem having a heterogeneous multi-robot area inspection problem as the base routing task, we empirically show that our hierarchical planner is able to compute optimal or near-optimal (within 7%) solutions approximately 16 times faster (on average) than an optimal baseline that computes plans for all the possible allocations in advance to obtain precise routing times. Furthermore, we show that a GNN-based estimator can provide an excellent trade-off between solution quality and computation time compared to other baseline (non-learned) estimators.