Agents can benefit from contracting some of their tasks that cannot be performed by themselves or that can be performed more efficiently by other agents. Developing an agent's contracting strategy in the University of Michigan Digital Library (UMDL), however, is not easy for the following reasons. The UMDL consists of self-interested agents who will perform a task of another agent's only when doing it is their own interests. In addition, multiple contracts take place concurrently such that other contracts currently in the system may have an impact on the success of one's own contract. Therefore, an agent who has a task (contractor) needs to model what the other self-interested agents think and will do, and it also needs to consider the influence of other contracts on its contract.
In this paper, we define the contractor's and the contractee's decision problems in the UMDL contracting situations, and present a contracting strategy by which a contractor can determine an optimal payment to offer. The contractor's problem is to find a payment that maximizes its expected utility, and it finds such a payment by modeling the contracting process stochastically using a Markov process. The Markov-process model is built based on the information the contractor has about the potential contractees and about the other contracts in the system. Our early results show that the contractor receives a higher utility when thinking about the potential contractees and the other contracts.
Resource constraints restrict the set of actions that an agent can take, such that the agent might not be able to perform all its desired tasks. Computational time limitations restrict the number of states that an agent can model and reason over, such that the agent might not be able to formulate a policy that can respond to all possible eventualities. This work argues that, in either situation, one effective way of improving the agent's performance is to adopt a phasing strategy. Resource-constrained agents can choose to reconfigure resources and switch action sets for handling upcoming events better when moving from phase to phase; time-limited agents can choose to focus computation on high-value phases and to exploit additional computation time during the execution of earlier phases to improve solutions for future phases.
This dissertation consists of two parts, corresponding to the aforementioned resource constraints and computational time limitations. The first part of the dissertation focuses on the development of automated resource-driven mission-phasing techniques for agents operating in resource-constrained environments. We designed a suite of algorithms which not only can find solutions to optimize the use of predefined phase-switching points, but can also automatically determine where to establish such points, accounting for the cost of creating them, in complex stochastic environments. By formulating the coupled problems of mission decomposition, resource configuration, and policy formulation into a single compact mathematical formulation, the presented algorithms can effectively exploit problem structure and often considerably reduce computational cost for finding exact solutions.
The second part of this dissertation is the design of computation-driven mission-phasing techniques for time-critical systems. We developed a new deliberation scheduling approach, which can simultaneously solve the coupled problems of deciding both when to deliberate given its cost, and which phase decision procedures to execute during deliberation intervals. Meanwhile, we designed a heuristic search method to effectively utilize the allocated time within each phase. As illustrated in analytical and experimental results, the computation-driven mission-phasing techniques, which extend problem decomposition techniques with the across-phase deliberation scheduling and inner-phase heuristic search methods mentioned above, can help an agent judiciously emphasize high-value portions of a large problem, while paying less attention to others, to generate a better policy within its time limit.
Approximate models of world state transitions are necessary when building plans for complex systems operating in dynamic environments. External event probabilities can depend on state feature values as well as time spent in that particular state. We assign temporally -dependent probability functions to state transitions. These functions are used to locally compute state probabilities, which are then used to select highly probable goal paths and eliminate improbable states. This probabilistic model has been implemented in the Cooperative Intelligent Real-time Control Architecture (CIRCA), which combines an AI planner with a separate real-time system such that plans are developed, scheduled, and executed with real-time guarantees. We present flight simulation tests that demonstrate how our probabilistic model may improve CIRCA performance.
Cooperating agents can make commitments to help each other, but commitments might have to be probabilistic when actions have stochastic outcomes. We consider the additional complication in cases where an agent might prefer to change its policy as it learns more about its reward function from experience. How should such an agent be allowed to change its policy while still faithfully pursuing its commitment in a principled decision-theoretic manner? We address this question by defining a class of Dec-POMDPs with Bayesian reward uncertainty, and by developing a novel Commitment Constrained Iterative Mean Reward algorithm that implements the semantics of faithful commitment pursuit while still permitting the agent's response to the evolving understanding of its rewards. We bound the performance of our algorithm theoretically, and evaluate empirically how it effectively balances solution quality and computation cost.
We present our approach to the problem of how an agent, within an economic Multi-Agent System, can determine when it should behave strategically (i.e. model the other agents), and when it should act as a simple price-taker. We provide a framework for the incremental implementation of modeling capabilities in agents. These agents were implemented and different populations simulated in order to learn more about their behavior and the merits of using agent models. Our results show, among other lessons, how savvy buyers can avoid being cheated by sellers, how price volatility can be used to quantitatively predict the benefits of deeper models, and how specific types of agent populations influence system behavior.
We describe an approach to multiagent metareasoning that uses organizational design to focus each agent's reasoning on the aspects of its local problem to which it can make the most worthwhile contributions to joint behavior. We summarize an organizational design problem that explicitly considers the quantitative impact that a design has on both the quality of the agents' behaviors and their reasoning costs. We overview an automated organizational design process that can approximately solve our design problem via incremental search, and outline techniques that efficiently estimate the incremental impact of a candidate organizational influence.