AI learning to gamble like a human
Using Inverse Reinforcement Learning (IRL) to model the risky decision making
Theory of IRL
In the context of reinforcement learning, our main interest is in uncovering a policy that performs optimally, or close enough. For this to be possible, the agent has receives feedback after interacting with the environment which guides the learning process. However the feedback is arbitrary; we decide what constitutes a good decision and a bad one based on the reward landscape defined by the environment hosting the agent.
Alternatively, we could forgo the specification of the reward landscape and let the agent learn from observing actions taken by a human expert. Training an agent this way allows us to uncover what features are most valuable or informative in human decision making by using the trained agent as a human proxy.
The BART
The Balloon Analogue Risk Task (BART) is a computerized measure of risk taking behavior. The BART models real-world risk behavior through the conceptual frame of balancing the potential for reward versus loss. In the task, the participant is presented with a balloon and offered the chance to earn money by pumping the balloon up by clicking a button. Each click causes the balloon to incrementally inflate and money to be added to a counter up until some threshold, at which point the balloon is over inflated and explodes. Thus, each pump confers greater risk, but also greater potential reward. If the participant chooses to cash-out prior to the balloon exploding then they collect the money earned for that trail, but if balloon explodes earnings for that trial are lost. Participants are not informed about the balloons breakpoints; the absence of this information allows for testing both participants' initial responses to the task and changes in responding as they gain experience with the task contingencies. Risk taking is a related, but phenomenologically distinct process from impulsivity.
Below is an example of what participants see on the screen, and you can play a version of the game here: PLAY
The setup
First of all, we have to translate the BART into a reinforcement learning environment and decide on what algorithms we think are most appropriate for the task of IRL. In order to do this, we have to define what are the states, available actions, and transition probabilites of the game. Considering that the max number of pumps is 128, after which the balloon will pop with 100% probability, we can set our states as representing the pump number currently on. Each new balloon is treated as a new run, making the BART an episodic task, and the probability of transitioning from state i to state i+1 is:
$$ P( s_{i+1} | pump, s_{i}) = \frac{128-i}{129-i} $$
We would like to know which features from the task are most informative to the participant playing the game, such that their choices at every step can be modelled as a function of these features. Fortunately the BART is a fairly simple task, so the number of considerations or features that can affect participants' choices are limited. Intuitively, we could argue that decisions are most likely based on previous experience, and current trial information (e.g. the number of balloon pumps leading up to the current state).