Detecting fraud in adversarial environments: A reinforcement learning approach

2018 
Credit card fraud is a costly problem for banks and a major frustration for consumers. As such, static models to detect fraud that rely on supervised training are exposed to the risk of being learned and circumvented. Previous adversarial learning work in fraud prevention showed increased effectiveness over static models that did not account for changing fraudster behavior. We extend this work by utilizing Reinforcement Learning and framing the fraudster and card issuer interaction as a Markov Decision Process (MDP) and performing prediction and control. Our MDP takes on the perspective of an agent (in this case the fraudster with a stolen credit card) who interacts with an environment (merchants and a fraud classifier), by taking actions (transactions), and receiving rewards (relating to whether the transactions were successful/declined). This approach allows us to simulate fraudulent episodes in such a way that techniques like model-free policy iteration can identify an optimal policy for the fraudster. The episode ends when the card is terminated by the credit card company for fraud. We found that, compared to a static classifier, making small changes to our fraud classifier on a regular basis led to a significant decrease in the ability of a fraud agent to learn an optimal policy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    7
    Citations
    NaN
    KQI
    []