Efficient use of limited computational resources is essential to intelligence. Selecting computations optimally according to rational metareasoning would achieve this, but rational metareasoning is computationally intractable. Inspired by psychology and neuroscience, we propose the first learning algorithm for approximating the optimal selection of computations. We derive a general, sample-efficient reinforcement learning algorithm for learning to select computations from the insight that the value of computation lies between the myopic value of computation and the value of perfect information. We evaluate the performance of our method against two state-of-the-art methods for approximate metareasoning--the meta-greedy heuristic and the blinkered policy--on three increasingly difficult metareasoning problems: metareasoning about when to terminate computation, metareasoning about how to choose between multiple actions, and metareasoning about planning. Across all three domains, our method achieved near-optimal performance and significantly outperformed the meta-greedy heuristic. The blinkered policy performed on par with our method in metareasoning about decision-making, but it is not directly applicable to metareasoning about planning where our method outperformed both the meta-greedy heuristic and a generalization of the blinkered policy. Our results are a step towards building self-improving AI systems that can learn to make optimal use of their limited computational resources to efficiently solve complex problems in real-time.
Planning is a latent cognitive process that cannot be observed directly. This makes it difficult to study how people plan. To address this problem, we propose a new paradigm for studying planning that provides experimenters with a timecourse of participant attention to information in the task environment. This paradigm employs the information-acquisition mechanism of the Mouselab paradigm, in which participants click on options to reveal the outcome of choosing those options. However, in contrast to the original Mouselab paradigm, our paradigm is a sequential decision process, in which participants must plan multiple steps ahead to achieve high scores. We release Mouselab-MDP open-source as a plugin for the JsPsych online Psychology experiment library. The plugin displays a Markov decision process as a directed graph, which the participant navigates to maximize reward. To trace the the process of planning, the rewards associated with states or actions are initially occluded; the participant has to click on a transition to reveal its reward. This information gathering behavior makes explicit the states the participant considers. We illustrate the utility of the Mouselab-MDP paradigm with a proof-of-concept experiment in which we trace the temporal dynamics of planning in a simple environment. Our data shed new light on people’s approximate planning strategies and on how people prune decision trees. We hope that the release of Mouselab-MDP will facilitate future research on human planning strategies. In particular, we hope that the fine-grained time course data that the paradigm generates will be instrumental in specifying algorithms, tracking learning trajectories, and characterizing individual differences in human planning.
Critical appraisal is an important skill for medical students. A proposed curriculum may be an effective teaching tool.To determine whether the teaching of critical appraisal can be successfully introduced into an osteopathic clinical clerkship in obstetrics and gynecology.Osteopathic medical students (N=77) were assigned by lottery to one of eight rotation groups during their clinical clerkship in obstetrics and gynecology. Four of these rotation groups received instruction in critical appraisal (study group; received evidence-based medicine [EBM] curriculum; n=38); the other four rotation groups did not (control group; received non-EBM; n=39). The ability of the study EBM group to critically analyze the literature was compared with that of the control (non-EBM) group on the basis of results of a multiple-choice examination.The University of Medicine and Dentistry of New Jersey-School of Osteopathic Medicine clinical clerkship in obstetrics and gynecology.The median scores for critical analysis were 41 for the control group and 64 for the study group. This difference was statistically significant (P<.001).The teaching of critical appraisal can be successfully introduced into a clerkship in obstetrics and gynecology.
Anticipated Transient Without Scram 2.1.1. Instrumentation and ControlThis subsection describes I&C engineering & operations activities for the conceptual design and detailed design phases that occurred through September 2022.• Section 2.1.1.1 describes the development of a division of responsibility (DOR), which is unique to the conceptual design phase.• Section 2.1.1.2 of this report addresses multiple activities undertaken during the conceptual design phase, as identified in Section 4.2 of the EPRI DEG [4].
Abstract One of the most unique and impressive feats of the human mind is its ability to discover and continuously refine its own cognitive strategies. Elucidating the underlying learning and adaptation mechanisms is very difficult because changes in cognitive strategies are not directly observable. One important domain in which strategies and mechanisms are studied is planning. To enable researchers to uncover how people learn how to plan, we offer a tutorial introduction to a recently developed process-tracing paradigm along with a new computational method for measuring the nature and development of a person’s planning strategies from the resulting process-tracing data. Our method allows researchers to reveal experience-driven changes in people’s choice of individual planning operations, planning strategies, strategy types, and the relative contributions of different decision systems. We validate our method on simulated and empirical data. On simulated data, its inferences about the strategies and the relative influence of different decision systems are accurate. When evaluated on human data generated using our process-tracing paradigm, our computational method correctly detects the plasticity-enhancing effect of feedback and the effect of the structure of the environment on people’s planning strategies. Together, these methods can be used to investigate the mechanisms of cognitive plasticity and to elucidate how people acquire complex cognitive skills such as planning and problem-solving. Importantly, our methods can also be used to measure individual differences in cognitive plasticity and examine how different types (pedagogical) interventions affect the acquisition of cognitive skills.