12 Learning direction theory and impulse balance equilibrium

2016 
The basic idea is simple, a qualitative form of ex post rationality. The learner chooses parameter pt≤pt-1 in period t if p≤pt-1 might have been better in period t–1, and chooses pt≥pt–1 in period t if last period p≥pt-1 might have been better. For example, in a first-price sealed-bid auction, the bidder who won the auction set the price. It might have been better for her or him to bid a bit lower, still winning the auction but paying a lower price. On the other hand, consider a bidder who did not win the auction but observes that his own value exceeds the winning bid. It might have been better for him to have bid higher. The specific prediction of learning direction theory is that parameter changes, when they occur, are in the indicated direction more frequently than would be expected with unbiased random choices. Examples of specific applications will be discussed below. It should be noted that learning direction theory is qualitative, not quantitative. Of course, quantitative theories can be based on it, and some examples will be noted shortly. Note also the contrast to reinforcement learning. Learning direction theory ignores the realized rewards per se, and relies entirely on comparisons with counterfactual payoffs for choices not made. Reinforcement learning, from its origin
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    17
    Citations
    NaN
    KQI
    []